November 29th, 2012

Cassandra long ago moved beyond its original design of limiting reads to primary key lookups.

This additional power does bring with it some additional complexity. To make it easier to understand what Cassandra is doing under the hood, we’ve added request tracing to Cassandra 1.2.

Tracing an insert

As a simple example, consider this example table:

Now we’ll enable tracing and insert a row. (These numbers are from a ccm cluster with all three nodes on my macbook air on a cold JVM.)

You can see that there are three distinct stages to a simple insert like this:

  1. The coordinator figures out which node(s) this row should be replicated to
  2. The replica (in yellow) appends the row to the commitlog, then adds it to the memtable
  3. The coordinator receives a confirmation from the replica and tells the client that the request was successful

Tracing a sequential scan

Let’s look at a more complicated example. Here I’ve pre-inserted ten rows like the above one.

This is substantially longer, since we’re doing a sequential scan across the whole cluster:

  1. The coordinator sets up the replicas to query
  2. (blue) The first replica queries is the coordinator, which has 6 rows
  3. (yellow) has 2 rows
  4. (green)> has 1 row
  5. (blue)> A second scan (of a different replication range) on the coordinator finds 1 more row

(CASSANDRA-4858 is open to merge the two queries against the coordinator to a single one.)

Tracing an indexed query

Now I’d like to examine using tracing to diagnose performance problems. Consider a simple table of users:

Now I’ve inserted a user named Bob Higginbigham, and 99,999 other users named Bob Smith.

I’ve edited this trace to focus on the replica that owns Bob Higginbigham.

Note how Cassandra has to scan all 30,000+ rows (all 100,000 including the other machines) to find the Bob we’re looking for, since we only have an index on firstname. Moral: Cassandra isn’t black magic, you still need to create appropriate indexes to get the performance you want.

Tracing a queue antipattern

Here’s a more subtle one.

It is tempting to use Cassandra as a durable queue, e.g.

This relies on Cassandra’s clustering within a partition (where the partition is the queue id) to order queue entries by creation time. Then grabbing the most recent queue entry is just SELECT FROM queues WHERE id = 'myqueue' ORDER BY created_at LIMIT 1.

Here’s what that looks like after creating and removing 100,000 entries:

Take a look in the middle here: “Read 1 live cells and 100000 tombstoned.”

Because Cassandra uses a log-structured storage engine, deletes do not immediately remove all traces of a row. Instead, Cassandra writes a deletion marker called a tombstone that supresses the old data until it can be compacted away.

So what we see in this trace is Cassandra having to read past all the older, deleted entries, before it gets to one that is still alive — a sign that you need to rethink your data model.


Tracing is part of Cassandra 1.2. Beta 2 is ready for testing; we expect the final release before the end of the year.