November 29th, 2012

Coming in Cassandra 1.2: binary CQL protocol

In Cassandra 0.8, the first version of the Cassandra Query Language has been introduced as an alternative to the Thrift API (so-called because based on Apache Thrift). In Cassandra 1.2, the final version for the third revision of CQL will be released with a number of features that we believe will provide a simpler API for Cassandra.

So far, CQL has still been using Thrift as a network transport. This was done initially out of convenience, because we wanted to focus on the language first, and Thrift was there, provided us a transport for free and is relatively fast. But CQL is in no way tied to Thrift for the transport, and while Thrift has advantages, it also comes with a few limitations:

  • it is strictly an RPC mechanism. You cannot have the server push informations to the client for instance.
  • it is a synchronous transport. You can only have one active request per connection at a given time.
  • while Thrift comes with support for a fair amount of languages, not every language are supported and our experience has shown that not all languages support were equal in term of stability and performance. It does make Cassandra client-side languages support directly tied to Thrift, which is not always a good thing (adding support for a new language in Thrift is for instance much more involved than simply writing support for the new protocol described in this post).

Also, Thrift is a generic framework, and we believe that a transport specifically tailored for Cassandra might bring additional control and maybe performance.

This has led to the new binary protocol that will be introduced by Cassandra 1.2. This protocol is a custom one and has been designed specifically for Cassandra and more precisely for CQL3 (that is, it only support CQL3). Amongst others, it offers the following features:

  • Asynchronous: each connection can handle more than one active request at the same time. In practice, this means that client libraries will only need to maintain a relatively low amount of open connections to a given Cassandra node to achieve good performance. This particularly matters with Cassandra where a client usually wants to keep connection to all (or at least a good part of) the nodes of the Cluster and so having a low number of per-node connections helps scaling to large clusters.
    Technically, this is achieved by giving each messages a stream ID, and by having responses to a request preserve the request’s stream ID. Clients can thus send multiple requests with different stream IDs on the same connection (i.e. without waiting for the response to a request to send the next one) while still being able to associate each received response to the right request, even if said responses comes in a different order than the one in which requests were submitted. That asynchronicity is of course optional in the sense that a client library can still choose to use the protocol in a synchronous way if that is simpler.
  • Server notifications: the protocol allows clients to register for certain types of events notifications. The currently supported events are cluster topology changes (a node join the cluster, is removed, or move), status changes (a node is detected up/down) and schema changes (the schema has been modified). When one of those events occurs, the server will push a notification to the registered clients. This allows those clients to maintain a state of the Cassandra cluster up to date without having to poll the cluster regularly. Obviously, more type of notifications might be added in the future, opening up a number of interesting possibilities.
  • Optional compression: messages of the protocol can be optionally compressed.

Interested parties can find the full specification for this protocol for Cassandra 1.2 here.

So what if you want to give that new protocol a try? First, you need a version of Cassandra 1.2 (at the time of this writing, the most recent release would be Cassandra 1.2.0-beta2, but release candidates should be out in the coming weeks and the final version should be released before the end of the year). Then, you need to activate the binary protocol server. Keep in mind that this protocol and its implementation are brand new. For that reason, the binary protocol server is not started by default (only the thrift server is). You can change that by setting the start_native_transport option to true in cassandra.yaml (you can also turn start_rpc to false if you’re not going to use the thrift interface). Lastly, you need a client driver that support this new protocol. One such driver (that is still beta itself) is the Java Driver that DataStax open-sourced a few days ago. But Cassandra 1.2 haven’t been released yet and more drivers will come shortly.

We believe this new protocol is a good addition to Cassandra and it already offers a number of advantages (asynchronicity, notifications, …) but this is definitively not the end of the road. In the short/medium term, we plan at least to:

  • Benchmark that new protocol: so far, we’ve focused on having a complete and usable protocol. In doing so, we have been careful to design a compact protocol and the server-side implementation of this protocol is based on Netty, which is known to be performant, but we have yet to properly benchmark it.
  • SSL Support
  • Cursor API: currently, as with Thrift, users must be careful to not do requests that return too much data, because in that case everything is buffered server-side and returned to the client in one message. This puts the burden of paging big request on the client however, which is not ideal, and we plan to handle this at the protocol level by adding some form of streaming support to the protocol.