February 11th, 2014

This posting was created by Patrick McFadin, Chief Evangelist for Apache Cassandra at DataStax. To view more postings by Patrick, be sure to check out his blog at PatrickMcFadin.com.

Many of you who know me, know that I rarely get negative about your data store choice. If you’ve done your diligence and it works for you, great. We live in an age with an over-abundance of choices, so something should fit your use case. Yes, I would prefer you use Apache Cassandra. Not because I’m an evangelist, but because I’ve personally done great things with this technology and I believe in it. When you say you will be going a different direction, I’ll wish you luck and go on my way. You won’t see me flipping out, waving my hands…

Until today.

You know when you see something repeatedly and you can’t help but think there is more to this? I’m having one of those moments. I’ve been doing full time consulting for Cassandra for probably a year and a half at this point. In the past year there has been a rising chorus of users stuck on a cliff with MongoDB and are desperate to get out. I hear some really tough stories about how it seemed to be a great fit when they started, only to find out it wasn’t matching the scale they needed. They were told it was “Web Scale” but to paraphrase Inigo Montoya from Princess Bride “This word Web Scale, it does not mean what you think it means” There have been some pretty public stories about this growing problem. The team over at Shift talked about the need for migration away from MongoDB and onto Cassandra is a recent interview. Many of us have done interesting tricks to make relational databases scale, they found that they were having to do the same stuff with MongoDB. I love this quote from John Haddad, Senior Architect at Shift:

“Cassandra is much more sane to deal with than MongoDB. MongoDB just has more moving parts architecturally, and pulling our data simply ground it to a halt. With Cassandra, it’s insanely fast, and managing the data is a no-brainer for us.”

When I ran infrastructure, I wanted the no-brainer. I say this all the time: the database should be the most boring thing in your datacenter. Is it scaling? Yep. Is it online? Yep. Boring.

They aren’t the only vocal ones either. On PlanetCassandra.org there are a lot of interviews just like it. Internet of Things search engine Shodan limited-out MongoDB and had to move or just stop collecting data. (Not an option.) Analytics provider Retailigence was driving off the same cliff. When you build an application to make money, you really don’t want something in the way. It’s like having a silent thief in your store stealing away your profits. If you have investors, that’s going to be a hard meeting when you say cash flow is down because of scaling problems.

The scaling and uptime of Cassandra are well known. We see it proven time and time again. Cassandra was born of the Dynamo paper from Amazon.com. A project started to store shopping cart data with just those requirements. They make an average of $1,000 a second and if you know stats, that is masking what happens during the holiday season. They put the A-list engineers on that problem and the end result was evolutionary computer science. Not revolutionary. Evolutionary. If your money pipe is on the line, don’t mess around and come up with something fancy and completely new. This, in part, is why we see engineers using Cassandra in production. It’s the evolved state of data storage for the next generation of web applications. And you don’t need to be a startup to see it. Many top tier companies are using Cassandra every day to make money. This quote from Christos Kalazantis at Netflix sums it up:

“We considered Mongo. We considered Riak. We considered all these other databases. But the architecture of Cassandra and its availability, consistency tuning and scalability made it a clear choice.”

So why are teams going down this path in the first place? Sadly, it’s something I would have conceded to MongoDB not to long ago. Ease of use for developers. When you have to get your application out the door, sometimes speed of development is the primary motivator. If you’re told it will scale, you check the box and get to building your application. A lot has changed over the past couple of years with Cassandra and I don’t think it’s true any more. CQL (Cassandra Query Language) has brought a very familiar syntax to the development story. “select * from users” Hey I get that! I talk to developers every day and I hear how they were almost instantly productive and writing applications. There isn’t a huge up-front payment with Cassandra anymore in learning how to get data in and out.

Ok. I can’t rant any more. I just want to help you if you are seeing that cliff. We don’t need anymore “I had to migrate off MongoDB” stories on Planet Cassandra. There are plenty. What can you do to get going? We have a lot of people creating amazing resources to help get your scaling story right the first time. If you’re brand new, head over to the Getting Started page on Planet Cassandra. We have a path lined up to get you started quickly: Docs. Data Modeling. VM to try out. It’s all there. How about your use case? There are functional use case examples to dig into and tons of industry use case interviews. When you have questions, don’t hesitate to ask! StackOverflow. Mailing List. IRC. If you want to use Twitter, just add a #Cassandra and we’ll find you.

When deciding on a database for your next project, the choice is yours. If what I’m talking about here matters to you, then consider what I’m saying. If you chose another database, we’ll still be friends. If you choose MongoDB because it scales? Chances are, we’ll talk again.