Vipul Sharma Director of Data Engineering at Eventbrite
"Cassandra is an amazing store. Some of the features that you get out of the box are pretty incredible: high availability, multi-datacenter support, etc."
Vipul Sharma Director of Data Engineering at Eventbrite

Eventbrite is an event-ticketing marketplace. We have two main focuses: One, providing tools and services to our organizers so that it’s super easy for them to use our services by creating events and selling tickets. Two, making the marketplace super easy for attendees to access our services, find events that are interesting to them and attend those events.

Personalized recommendations to millions

Primarily we are using Cassandra for our consumer experience features, which is basically delivering recommendations and various discovery funnels out to our website and to our mobile apps and some parts of our API. The way we have scaled our architecture is we have MySQL as our transaction data store and we have decoupled services from that transaction data store to different parts of our infrastructure. One of the key parts is being able to actually calculate a large amount of data, upload it quickly, and serve it to our tens of millions of users. That’s where we use Cassandra.Eventbriteiphone

Another place we use Cassandra is that we are building a spam and fraud specific data warehouse; this warehouse will be used to serve a large group of queries that have random access. These are the two places we are primarily using Cassandra.

Consistency requirements

We were primarily doing everything out of MySQL before and, as we are growing rapidly, we needed to decouple our services based on the consistency requirements.

Cassandra provides us an easy way of providing a highly available store and not have to worry about things like sharding, multi-datacenter support and things like that. That’s why Cassandra was very attractive to us, where we didn’t have high consistency requirements.

Hacking within minutes

I think one of the reasons why Cassandra has taken off and is doing so well is because of its awesome community; I have had the pleasure of interacting with a few committers. I have worked with a few people in the past, specifically where there was a lot of contribution done to Cassandra.

It’s an awesome group of really smart people. All the improvements that we are seeing at Cassandra are amazing; especially with DataStax, in terms of all the documentation that DataStax has written it is just incredible for anybody to get on and start hacking within minutes.

Lessons learned

I think that we have been very careful about using data stores where their strengths are; I think that many problems occur when you have to go back and redo things. For example, if I was to build a solution where I need range queries out of Cassandra, that would be a bad thing because Cassandra’s not built for that. We were wise in actually determining where Cassandra’s strength lies and what exactly we need out of these features, and that’s the only place where we are going to use Cassandra. We do the same thing with out other solutions as well, like HBase ,MySQL, and Redis.

We have a bunch of different kinds of data stores in our infrastructure but all of them are serving to their strengths. If I had to go back and do something different: Since we run Cassandra on the cloud, we do see some IO bottlenecks; in the future, we might actually think about using SSDs or our own data centers to dissolve some of those issues.

Out of the box

The only advice I would give you is use it where the strengths are; I don’t feel comfortable in saying that you should use Cassandra with high consistency requirements, if you have performance on the back of your mind. If performance is critical, I like to use Cassandra with low consistency requirements and not do things for which a data service is not made for. For example, doing range queries or running heavy map reduce jobs and stuff like that.

Cassandra is an amazing store. Some of the features that you get out of the box are pretty incredible: high availability, multi-datacenter support, etc. With the performance, you’re able to do a ton of writes and a ton of reads.

Follow @twitter