March 28th, 2016

Jon Haddad, Technical Evangelist, DataStax
Jon Haddad has 15 years experience in both development and operations. For 10 years he’s worked at various startups in southern California. For 2 years he had been the maintainer of cqlengine, the Python object mapper for Cassandra, now integrated into the native Cassandra driver. He's now a Technical Evangelist at Datastax, continuing to focus on advancing Cassandra in the Python, operations and data science communities. Jon holds a degree in Computer Science from the University of Vermont.


At the beginning of the month Cassadra 3.4 was released. 3.4 is a feature release according to the Tick Tock release schedule. Let’s take a look at what’s new.

On the operations side it’s been possible to tell Cassandra to compact specific SSTables through JMX for a while now. Cassandra 3.4 brings this to the nodetool command (CASSANDRA-10660). You can now supply a –user-defined flag and a list of sstables that should be compacted together. This is very helpful for purging tombstones in sstables that you know should be compacted and you want to make it happen immediately.

For improved security, Cassandra 3.4 now supports encrypted hints. In Cassandra 3.0, the implementation of hints was changed  to be based on an append only commit log rather than a sstables. Read up on the hint changes in the DataStax Tech Blog. With this change comes an additional need for security, so now we have encrypted hints as an option.  

CASSANDRA-10392 gives us support for custom tracing implementations in Cassandra itself. This is very cool when you consider integrating Cassandra into the rest of your distributed environment. For instance, the Zipkin project, modeled after the Dapper paper from Google, allows applications to send metrics to a central location to later be analyzed. This makes it easier to debug performance problems in heavily distributed systems. The Last Pickle provided a Zipkin tracing plugin for Cassandra.

Secondary indexes have been around for a while. Secondary indexes on static columns. It’s important to state that secondary indexes in Cassandra are there for convenience, not performance. It’s faster to denormalize and have multiple views into your data than to query all nodes.

Last but not least, 3.4 brings a brand new secondary index implementation to the table. SSTable attached secondary indexes, or SASI, is a completely new approach to handling secondary indexes in Cassandra. It promises to be more performant as well as supports more features than the existing secondary index implementation. SASI supports inequality searches and added support for a LIKE clause, giving us the ability to create indexes on our text fields and search within them. I wrote a more in depth overview of the SASI implementation in my blog.

As you can see, the Cassandra team has been cranking on big improvements. It’s very cool to see new features released regularly and iterated on rather than held back for half a year then released all at once. Smaller changes released more frequently is a lot easier to reason about when bugs come up and need to be fixed. Be sure to view the full changelog to see the full list of features and bugfixes.