October 11th, 2013

By 

 

Paul Cichonski:  Senior Software Engineer at Lithium

Christian Hasker: Editor at Planet Cassandra, A DataStax Community Service

 

TL;DR:  Lithium helps companies connect with their most passionate customers by enabling companies to listen and respond to their customers through social channels and building engaging online communities.

 

Lithium is using Cassandra on the back end to store all the user subscriptions to anything that might be going on in the community. They then take that notification and store it in the user’s activity feed, so we’re using Cassandra to store user’s activity feeds and news feeds as well.

 

They have two clusters running in two data centers and there’s no communication across clusters. Each of the clusters right now is four nodes with each node being a box with eight cores. They have two terabyte spinning disk drives for the commit logs and three 256 gigabyte SSDs on each box for the data drives. 


Paul, why don’t you start off today’s Apache Cassandra use case and just tell people a little bit about what Lithium does and then also what your role is there.

At Lithium, our role is to help companies connect with their most passionate customers. We do this by enabling companies to listen and respond to their customers through social channels and building engaging online communities where customers can interact with each other. We work with some of the largest companies in the world such as Google, Cisco, Vodafone etc. I’m a senior software engineer on the core community team, so I help with core infrastructure and with building out new features in the core product.

 

Let’s talk specifically about Cassandra. What is Cassandra doing there at Lithium?

Right now, we’re moving all of the code in our product that does subscription fulfillment and notifications into a Cassandra-backed service. We are using Cassandra on the back end to store all the user subscriptions to anything that might be going on in the community. We then pass all product events through this service and query the subscriptions to figure out who’s interested in that event. Once that happens, we create notifications that go out over different mechanisms like email. We’re also working on SMS-push and in-app notifications. We then take that notification and store it in the user’s activity feed, so we’re using Cassandra to store user’s activity feeds and news feeds as well.

 

So why Cassandra? I mean I can think of one of its strengths as far as time series data, being able to write very fast, but why were you attracted to Cassandra? Did you look at anything else along the way?

We use MySQL a lot internally for most of our core products. We did consider that a little bit. The main reason we went to Cassandra was again for the time series data. The user activity feed is one of the core use cases. We didn’t really evaluate a ton of other products. We had an architect on the team that had familiarity with Cassandra so we went to that and started using it.

 

Presumably, you were new to Cassandra. How was it? Making the transition from MySQL relational background to picking up skills needed for Cassandra?

I found it pretty easily. I’ve had experience with other types of big data. I’ve worked with graph databases, so it wasn’t that hard to make the switching of the mental model. I think the CQL helps a lot for training more junior engineers for how to actually interact with Cassandra, as long as you can coach them with the fact that it’s not a direct one-to-one mapping with SQL, and there are things that if you do in CQL is not going to work the way you think it is. There hasn’t been that much of a learning curve. We’ve been able to come up to speed pretty quickly.

 

If you could talk a little bit about what your Cassandra environments looks like at Lithium that would be great, type of hardware, and number of nodes, stuff like that?

We have two clusters running in two data centers and there’s no communication across clusters. Each of the clusters right now is four nodes with each node being a box with eight cores. We have two terabytes spinning disk drive for the commit logs and then we have three 256 gigabyte SSDs on each box for the data drives. Each of the nodes has I think 64 gigabytes of memory.

 

What version of Cassandra are you running with CQL?

1.2.6.

 

Any plans to go to Cassandra 2.0?

We’d like to. We’re still rolling our beta customers in production with 1.2.6. Once we get everyone on it, we’ll probably start thinking about the upgrade. We’re hoping to let other people iron out most of the the bugs before we can move to it in prod.

 

Yeah, that’s a way to give me then open-source. As far as any advice you would give to other organizations looking to get started with Cassandra, what would you pass along?

Get involved in the community, be on the mailing list, and the IRC chat helps a lot. Also, make sure before you push in the production you’ve heavily tested your application, not just performance tests but actual long running continuous usage tests on the app using expected production traffic. We’ve encountered a lot of cases where we’ve done a lot of tuning on the schema, like switching compaction strategies, and if we would have hit those cases in production it probably would’ve been a lot harder to deal with.

 

Anything else you’d like to add? Anything you’d like to see in Cassandra that isn’t there today?

2.0 has a lot of cool features that I’m looking forward to, especially triggers. Everything has been going good so far.

 

We love to hear that, and thank you very much.