October 9th, 2013

“For our live site, the primary concern is high availability. Our sites are distributed across three different geographies, hence, we sacrificed consistency; it was okay to be eventually consistent.”

-Jijoe Vurghese, VP of Architecture & Infrastructure at Riptide IO

Jijoe Vurghese
VP of Architecture & Infrastructure at Wize Commerce

Eran Withana Architect at Wize Commerce


Could you tell us a little bit about what Wize Commerce does?

Jijoe: For over ten years, we have been helping clients maximize their revenue and traffic using optimization technologies that operate at massive scale, and across every channel, device and digital ecosystem. Wize Commerce is a platform that delivers qualified traffic, increased monetization and revenue and profit growth to our merchants, driving over $1B in annual merchant sales. We specialize in acquiring and engaging customers, bringing them to robust comparison shopping sites to help them find what they’re looking for.

Excellent. How does Apache Cassandra fit into the mix there?

Jijoe: Very early on in our company’s existence, we decided not to expose relational databases to our live site. We needed a system that could scale up to the amount of reads and writes that we handle in our system, across multiple geographical locations in the world. This is where Cassandra fits in. It is our read/write, distributed data store.

How does that work with your use case? If I were to use your services, how does Cassandra come into play?

Eran: The most common user interaction on our sites is to search for products with terms like men’s black jacket. The matches come from our search system (Solr) whereas details about these matches come out of Cassandra (key-based lookup). In addition, data about the products users have viewed on our site, is persisted to Cassandra.

Okay, excellent. What was your original motivation for choosing Cassandra? Were there other technologies that it was also evaluated against?

Jijoe: For our live site, the primary concern is high availability. Our sites are distributed across three different geographies, hence, we sacrificed consistency; it was okay to be eventually consistent. This automatically eliminated a couple of options like HBase, MongoDB, etc. We started this exercise about two years ago, back then Cassandra was one of the only mature technologies in a narrow field. Having eliminated those other choices, we put Cassandra through a suite of performance tests which Eran can talk about. With these tests, we proved to ourselves that Cassandra would scale for our use case, across multiple geographies.

Eran: Before we talk about the evaluation, let me explain a bit on the multi-layered data store we have. The first level is an in-process, Java cache; the second level is memcached and the third level used to be MySQL. We deployed Cassandra cluster across three different data centers initially replacing MySQL layer. We had to answer following questions before we proceed with the deployments.

1. Which Cassandra version should we be using (by then Cassandra 0.8 was stable and Cassandra 1.0.x version was coming out).

2. Can it handle 10x the read and write requests load patterns we have? How would read/write latency look like when we get to 10x read/write load.

3. Will it perform well in a multi-datacenter environment?

4. How would median, 95th and 99th read/write percentiles scale with increased load?

5. What parameters should be tuned and to what values within Cassandra for our use case.

Keeping the above questions in my mind we devised a series of performance tests on Cassandra 0.8 and 1.0.x versions. We used YCSB (Yahoo Cloud Serving Benchmark) to send requests at increasing requests per second and monitored what’s happening within Cassandra and to client side read/write latency. For #5 above, we devised another set of tests to determine optimal values for Cassandra settings like read and write thread counts, client side load balancing policy, etc.

Would you be able to share some insights into what your deployment looks like?

Eran: Yes, in our deployment, we have multiple data centers – in the United States, we have three and in Europe, one. We also have three major groups of data required on the live site. We have one set of Cassandra nodes per data group in each data center. We have three Cassandra clusters (one per data group) across the United States and another three clusters for Europe. Both EU and US clusters span three geographically distributed data centers to improve reliability and fault tolerance.

Each of the nodes has dual, quad-core CPUs, 48 GB of RAM and four spinning discs. In total we have about 200–250 nodes across all the clusters and total data is about 5 terabytes. We removed our memcached layer and introduced row cache into our Cassandra nodes. This in fact had improved our overall performance.

We have monitoring systems deployed across every cluster to monitor both Cassandra and client side read/write latencies, compactions, repairs, Cassandra behavior and whatever we need to monitor from the live site point of view.

You had mentioned that you are in multiple data centers across the U.S., three. Would you be able to tell us about your experience with; are you using multi data center replication?

Eran: Yes, definitely. This saved tons of headache for us since, unlike other technologies out there, we don’t have to manually intervene for the replication to work and we have more than one setting (read replication factor, explicit repairs, consistency policies, etc) to control the replication.

For someone who’s getting started with multi datacenter replication, do you have any advice for them?

Eran: That’s a good question. First of all, coming from a background where we had to contend with MySQL replication, Cassandra auto-replication is awesome. So we really wanted to use it. One thing we struggled a bit was when we were trying to divide the token space among the nodes in the whole cluster in all data centers.

Initially, we took the complete token space and divided across the cluster, not taking the individual data centers into consideration. This was a very bad idea and the performance was not good. Then, thanks to our Cassandra community answers, we divided the token space only within the nodes in a given data center. If there are conflicts across data centers, then we incremented the tokens of conflicting nodes by one. For example, if we have three data centers with four nodes in each DC, then the tokens would look like the following.

Node 1, DC 1: 0
Node 2, DC 1: 42535295865117307932921825928971026432
Node 1, DC 2: 1
Node 2, DC 2: 42535295865117307932921825928971026433
Node 1, DC 3: 2
Node 2, DC 3: 42535295865117307932921825928971026434
Node 1, DC 4: 3
Node 2, DC 4: 42535295865117307932921825928971026435

The other advice that I could give is that there will be constant streaming across nodes, across data centers; there should be a way for you control this. Cassandra, for example, provides you “control valves” so that you can control the streaming going on. You need to understand what exactly these valves are doing and then set the values properly, otherwise your nodes will be unnecessarily busy.

Okay, very cool. In regards to the Apache Cassandra community, do you have any experience with meet ups, or five minute interviews, or the mailing lists; anything virtual or physical.

Eran: First of all, I think the Cassandra IRC chat is awesome. So there are a lot of people always hanging around there and they’re helping each and every person that’s coming in. I think that’s one of the major reasons behind the success of our project and then also mailing lists. There are a lot of people active, not only initial contributors, everyone is active there.

Then I actually happened to join some of the meetups that we had. In fact, there was one meetup in DataStax as well. We ourselves, in collaboration with Bay Area Cassandra meetup group, organized a meetup at Wize Commerce. As a community, I think maybe it’s because of the Bay Area, it’s very engaging and there are a lot of help out there if you want to get some help.

Well guys, thank you so much for joining us today. Is there anything else, before we sign off, that you’d like to add?

Jijoe: We’re just getting started on the 2.0 release now with getting it into production. We are excited to look at some of the features like CQL, vnodes, etc.