August 2nd, 2013


“Cassandra is actually our primary data store. We use it to store all e-commerce data for long archival and real-time analytics.”

-Mike Peters, General Manager at Software Projects

Mike Peters General Manager at Software Projects



Hello Planet Cassandra users. Today we have Mike Peters, General Manager at Software Projects. Mike, thank you so much for joining us today. Really excited to hear about how you’re using Apache Cassandra. To get things started, could you tell us a little bit about what Software Projects does?

Sure. Good thing Brady, the pleasure is all mine. Software Projects has been around for over 10 years now. We are focused on helping small to medium sized businesses sell more products and services online. We currently drive about $1 billion in e-commerce transactions, and we’re doing that for 3,000 businesses in 14 countries total. We also provide managed Cassandra hosting for dozens of startups.

That’s fantastic. And how does Apache Cassandra play into the mix there? You mentioned that you do Cassandra hosting; is there anything else that you’re using Cassandra with? 

Yeah. For our own use we drive a lot of e-commerce transactions for clients. Cassandra is actually our primary data store. We use it to store all e-commerce data for long archival and real-time analytics. We’ve actually been using Cassandra since the early days, since version 0.5. I have to say it was a bit rocky at first… we were actually ready to give up on it a few times when rolling upgrades never really worked right out of the box. As the product matured, things stabilized and we’re very happy that we hung in for the ride.

Over the years we’ve learned a lot about getting the most out of Cassandra, which use cases are a great fit and which ones just don’t work very well. Internally we use Cassandra for immutable storage and we’re not doing any deletes. We found that this usage pattern works best with Cassandra.

Excellent. What was your motivation for using Cassandra? Are there other technologies that you looked at and evaluated it against?

We’ve been doing this for over 10 years, and over the years we’ve evaluated a lot of different technologies. We started with sharded MySQL, we then moved to HBase, and after that we toyed around with Couchbase. We also had an implementation of MongoDB at one point and we even used Redis.

We absolutely love the speed and performance of Redis, but unfortunately it doesn’t scale very well. We also had a number of home-grown implementations but we ended up choosing Cassandra for its linear scalability, consistent throughput, no single point of failure and ability to never ever corrupt data on failure.

Excellent. Would you be able to share with us some insight into what your deployment looks like?

Sure. I’ll be happy to Brady. We have 200 Cassandra nodes right now running in two geographic regions. We’re using our own data centers and the servers are all running FreeBSD, which actually is not the most popular option. All of our servers are running FreeBSD 12 cores, 24 gigabyte of memory using two spinning discs, and each Cassandra node is holding between 300 to 500 gigabytes. 

That’s a lot of data, that’s awesome. You said you had two data centers. Are you using multi-datacenter replication as well?

Yes, we sure are. That was a key part of our decision to go with Cassandra. Since we’re dealing with financial data and e-commerce, it was super important that we are able to keep our client shopping carts up and running at all times. Even in the case of complete data center catastrophe. It was very important to make sure that our platform is capable of running in two data centers and we’re able to keep running when we lose an entire data center.

For someone who’s trying to get started with multi-data center replication [it seems to be a hot topic right now], what is a piece of advice that you would give to this person? 

That’s an excellent question. The advice that I would give is to tread very carefully. There are a lot of complications when you deal with multi-data center deployments. This is somewhat ironic but I would say that if there is any way you can avoid having to run multiple datacenters, do that. If you absolutely require two data centers, make sure you take the time to run things in a test environment. Understand the implications of losing an entire data center. Make sure that your replication is aware of the fact that you could lose an entire data center.

For instance, if you’re doing quorum reads or quorum writes, you’ll need to program the application to fail gracefully going from maybe a quorum read to a local quorum read; this will help substantially in a case where the remote data center goes down and you want to continue serving customers.

For future versions of Apache Cassandra, is there anything in specific that you’d like to see? 

We’d love to see improvements to read performance. Cassandra is phenomenal with write; it’s not so hot with read. We’re used to read performance with MySQL, Redis, CouchBase… all these guys have such better read performance. There is a lot of work that is going on right now in the Cassandra Community and DataStax on further improving read performance and there have been great improvements over the last 6 months or so. This is a hot topic for us, so anything than can be done to further improve read performance would be a big win.

Beyond that, we’d love to see better built-in internal tuning of Cassandra. At this point in time, tuning Cassandra is more an art than a science. It takes a great deal of understanding of the internals of Cassandra if you want to tune it right. We would love to see in future versions, an ability Cassandra to automatically tune itself for the average user.

You had mentioned Planet Cassandra and the Cassandra Community; what’s your experience there with the physical or virtual community? 

It’s been very exciting. Over the years, we get to interact with quite a few of the experts in the Cassandra Community. We hired a couple of consultants to help us troubleshoot use cases. We actually also found new clients inside the Cassandra Community. In general, we got to know a lot of great people.

It sounds like you’ve had a really good experience so far.

Yeah, we sure have.

Mike, thank you so much for joining us today I really enjoyed hearing about how you guys are using Cassandra over at Software Projects. Before we end here, is there anything else that you’d like to add?

Yeah. Actually, the one thing I would add is if you’re using Cassandra in production right now, make sure you’re using version 1.2 or higher. There are a lot of improvements in that version that go a long way. Beyond that, I would say take the time to watch the presentations from the recent Cassandra Summit 2013, there are a lot of great insight there; I really enjoyed Axel Liljencrantz presentation “How Not to Use Cassandra”. There are a lot of absolutely phenomenal presentations that you owe it to yourself to take the time and watch. That’s it. I just want to thank you again for building an awesome product and just keep crushing it. 

Thanks Mike; we really appreciate it. Best of luck to you. 

Thank you.