January 29th, 2014









“Relational databases were unable to implement the technology and applications we needed.”

-Brett Lawrence, Lead Engineer at Wiggle

Brett Lawrence Lead Engineer at Wiggle




Wiggle is an on-line sporting goods retailer. We started in ’99, in what was essentially the back of a bike shop, and have grown extremely rapidly since then to ship globally in the cycle, run and triathlon industry. We’ve grown into a 500-person company serving more than 100 countries and generating more than £140M in revenue in 2012.

A need for scalable personalization

About three years ago, we wanted to add some personalization to our website showing customers products that they might like, rather than just the single product they were on. We wanted to implement an algorithm that would show them which product had been bought in the same basket as the one they were looking at, as well as which products had been viewed by other people in the same setting as the product they were viewing.

We initially implemented both technologies on Microsoft SQL Server. We found that the second algorithm, the product also viewed, wouldn’t perform for us in any way near that would scale in production. This is because there are a lot more products in an average view than there are in a basket. Our system at the time wasn’t capable of supporting the load and we weren’t in a position, or we didn’t have a plan at that time, to upgrade it so that we could handle it. So we started exploring new technologies.


We contracted a third party to build us a solution using Cassandra, which would enable us to grow the  “also viewed” part of our recommendation engine for a capacity of five or six years. That project went very successfully up until about six months ago where our automatic, every four-hour MapReduce process we implemented outside of Cassandra started to fail for us.

Based on our evaluation we decided to move to a production enterprise version of the software that would be supportive, and would have a predictable roadmap. So it was natural for us to re-implement this algorithm on DataStax Enterprise and the changes from the first software that we initially used, to the current DataStax Enterprise are incredible. We’re thinking about other things that we can do with it as well now.

We were also aware of the multi-data center support and we have a plan that we’re starting to put in place now, regarding actually hosting data in multiple data centers globally, that we at the time thought Cassandra would be useful for.

Other NoSQL alternatives

We looked at them and used Mongo for other small projects. At the time it seemed like the integration that DataStax Enterprise provided between MapReduce and Cassandra would make the whole process much more streamlined for us.

Wiggle’s Deployment

We’ve got four nodes. Our data before we reduce it is, it’s probably relatively small in the big data sense, but I think we’re looking at 40 or 50 gig per node. In terms of rows, we’ve got around 160 million in the way we store them, that’s before we reduce that data, which after we reduce it we end up with about 20 gig per node, which is 18 months of data. When we initially forecasted our deployment we prepared for five years’ worth of data, so we expect to just leave it running for five years.

Managing their Deployment

DataStax OpsCenter helped drive us toward DataStax Enterprise and make a full purchase because now we could actually hand over maintenance to our operations team instead of leaving it with developers.

Benefits of Cassandra + DataStax Enterprise

Relational databases were unable to implement the technology and applications we needed, and the solution that we’ve developed generates demonstrable revenues. I think Cassandra is another tool that is appropriate for certain jobs and if you don’t have it in your infrastructure, you’re limited.

For those new to the NoSQL scene

We had some challenges with the analytics, wrapping our heads around that to produce properly. The challenges mainly for us from a knowledge and learning perspective were being able to hand over maintenance to our operations department. We were very used to SQL Server and relational databases and it was difficult to persuade them to buy into NoSQL. But the benefits have certainly proven themselves out since then.