May 2nd, 2014


4 Reasons Why Data Engineers Don’t Use Cassandra” was created by Patrick McFadin, Chief Evangelist at DataStax, as part of Hakka Labs’ Cassandra Week.


Three years ago, I was stuck trying to get a use case fit into my Oracle database. It was getting expensive fast and I was running out of budget. A friend suggested I try Apache Cassandra for the task and the time series use case was perfect. It’s not a perfect database and it was really hard to get my head around the data model and the driver support was scattered. There were a few points where I was ready to just give up and pay Oracle but I stuck with it. Cassandra was the solution that fit my problem, and after a long uphill climb, it worked better than I’d expected.

A few weeks ago, Apache Cassandra passed a significant milestone – It’s five years old! For software, five years is a real milestone for maturity. When I’m out talking to people, I hear some say “I used Cassandra awhile ago, and it just wasn’t for me”. That was almost my experience. I tell those same people to give it another look. Cassandra had some growing pains, but this project has not stood still. Let’s take a look at some of those things that may have turned you away as a developer.


Consistency is flakey

If you have been working with relational databases for a while, the term “Eventual Consistency” seems just like crazy talk. It’s been a favorite dig on Cassandra. “If you are serious about your data, eventual consistency just works against that.” A number of companies, large and small, that have found that Cassandra’s consistency model works perfectly for them. By providing a tunable consistency model, there is a greater flexibility for the developer and architect. This has led to innovative and unique use cases not possible in fully consistent systems. A new feature called Lightweight Transactions (LWT) was released in Cassandra 2.0. This brings ACID capabilities to your data model when needed. Again, completely tunable at runtime. Tunable consistency is about choice. Developers and architects should have those choices when designing systems and not be constrained by the most rigid of them all.


Reads are slow

Cassandra was optimized from the beginning for fast writes. Reads were not as much of a concern but that quickly changed as more use cases were considered. Over time, read performance has steadily gained and is almost at parity with write speeds today. The other metric that has improved steadily is latency. Many improvements have been added over the years and even more to come. What users are seeing now are single digit millisecond, 95th percentile reads and in some cases much lower. With today’s applications, slow is as good as down. As we approach Cassandra 3.0, you can expect some even more dramatic changes. We have seen Cassandra largely replacing traditional OLTP workloads and the parity between reads and writes has been a large driver.


You have to be a master at tuning the JVM

Cassandra is written in Java. If you have run Java servers in your career, you know that the Java Virtual Machine (JVM) can be your worst enemy at times and the face of that enemy is garbage collection. Early versions of Cassandra required some work at tuning the JVM for certain workloads. Get it wrong, and you might be faced with punishment in the form of increased latency or just plain failure. The developers of Cassandra realized that memory problems in the JVM are best solved by just not using it. Memory inside the JVM is called heap and when Java uses memory outside of the JVM, that’s called off-heap (naturally). The move to get more off heap in Cassandra has been a steady one. Since Cassandra 1.2, the need for specialized tuning has almost been eliminated. The outlook for future versions of Cassandra is even more exciting with some of the largest memory segments being removed, leaving only a small part of the heap required. The net effect will be predictable latencies over a wide variety of loads.


It’s hard to use as a developer

Early days of the Cassandra data model were fairly simplistic. Use Thrift RPC to send a mutation of a row, column and a value. From a developer standpoint, it took a bit to get over the learning curve. In the last two years, that data model has undergone a big change. We now have Cassandra Query Language which gives you a familiar SQL-like interface to create schema and access data. With full driver support for many languages, developer productivity is a much shorter runway. Recently, I’ve taught some 4 hour tutorials, and in that time frame, developers were designing data models and writing code. You should feel confident as a developer to use a database and we want you to consider Cassandra because you understand it and feel productive. As the drive to make Cassandra more usable continues, get ready to see great new features being added. Things like nested collections, should turn this into a powerful tool for developers.

As you can see, there is a lot to get to know if you haven’t seen Cassandra in a while. If you are new, look how much has been accomplished in just five years! All of these changes are here because we, as a community, have contributed our time, energy, use cases and maybe even code, to making Cassandra better for everyone. Real production use cases have driven those changes. A quick look at the committer list shows that the companies represented, like Twitter, Apple and Spotify, validate some of the mission critical uses today. I like this saying: “Companies use Oracle to count their money. They use Cassandra to MAKE money”.

So, what’s your use case? Are you ready to take a look? Head over to Planet Cassandra and get started today.