Vipul Sharma Director of Data Engineering at Eventbrite
Hello Planet Cassandra users. My name is Brady Gentile and I’m a community manager at DataStax. Today we’ve with us Vipul Sharma, Director of Data Engineering at Eventbrite. Vipul, thanks for taking the time to do this interview with us today; to start things off, what exactly does Eventbrite do?
Eventbrite is an event-ticketing marketplace. We have two main focuses: One, providing tools and services to our organizers so that it’s super easy for them to use our services by creating events and selling tickets. Two, making the marketplace super easy for attendees to access our services, find events that are interesting to them and attend those events.
Very cool. As part of the EventBrite offering, where does Cassandra fall into place? How are you using Cassandra at EventBrite?
Primarily we are using Cassandra for our consumer experience features, which is basically delivering recommendations and various discovery funnels out to our website and to our mobile apps and some parts of our API. The way we have scaled our architecture is we have MySQL as our transaction data store and we have decoupled services from that transaction data store to different parts of our infrastructure. One of the key parts is being able to actually calculate a large amount of data, upload it quickly, and serve it to our tens of millions of users. That’s where we use Cassandra.
Another place we use Cassandra is that we are building a spam and fraud specific data warehouse; this warehouse will be used to serve a large group of queries that have random access. These are the two places we are primarily using Cassandra.
Very cool. For those services, did you switch from another database offering to Cassandra?
We were primarily doing everything out of MySQL before and, as we are growing rapidly, we needed to decouple our services based on the consistency requirements.
Cassandra provides us an easy way of providing a highly available store and not have to worry about things like sharding, multi-datacenter support and things like that. That’s why Cassandra was very attractive to us, where we didn’t have high consistency requirements.
Awesome. Are you running Cassandra in the cloud or your own datacenter?
We’re running it in a cloud. All of our infrastructure is in EC2 right now.
Very good. Based on your interaction with the Cassandra community, whether it be the physical community or the virtual community, what are your thoughts on what they can provide?
I think one of the reasons why Cassandra has taken off and is doing so well is because of its awesome community; I have had the pleasure of interacting with a few committers. I have worked with a few people in the past, specifically where there was a lot of contribution done to Cassandra.
It’s an awesome group of really smart people. All the improvements that we are seeing at Cassandra are amazing; especially with DataStax, in terms of all the documentation that DataStax has written it is just incredible for anybody to get on and start hacking within minutes.
That’s excellent. It sounds like you’ve had a really good experience so far. You’ve been using Cassandra for a while now. In hindsight, would you have done anything differently when you first started using Cassandra?
That’s a very good question. I think that we have been very careful about using data stores where their strengths are; I think that many problems occur when you have to go back and redo things. For example, if I was to build a solution where I need range queries out of Cassandra, that would be a bad thing because Cassandra’s not built for that. We were wise in actually determining where Cassandra’s strength lies and what exactly we need out of these features, and that’s the only place where we are going to use Cassandra. We do the same thing with out other solutions as well, like HBase ,MySQL, and Redis.
We have a bunch of different kinds of data stores in our infrastructure but all of them are serving to their strengths. If I had to go back and do something different: Since we run Cassandra on the cloud, we do see some IO bottlenecks; in the future, we might actually think about using SSDs or our own datacenters to dissolve some of those issues.
Excellent. Is there anything else that you would like to add about Cassandra?
Yes, I think that Cassandra is an amazing store. Some of the features that you get out of the box are pretty incredible: high availability, multi-datacenter support, etc. With the performance, you’re able to do a ton of writes and a ton of reads.
The only advice I would give you is use it where the strengths are; I don’t feel comfortable in saying that you should use Cassandra with high consistency requirements, if you have performance on the back of your mind. If performance is critical, I like to use Cassandra with low consistency requirements and not do things for which a data service is not made for. For example, doing range queries or running heavy map reduce jobs and stuff like that.
Okay, excellent, very good. Vipul, thank you so much for joining us today. I wish you the best of luck with your usage of Cassandra.
Thank you so much. I’m looking forward with engaging more with the community.