July 24th, 2013


Since the beginning of 2013, we’ve done a lot of hiring for the Test Engineering organization here at DataStax. During the on-boarding process, I’ve found myself giving the following primer to our new hires, so I thought I would share the same with you. For many this will be extremely basic, but for newbies my hope is that you will be able to conceptualize and see tangible behavior behind concepts explained in C* documentation. For me, I feel the best way to figure out answers to my C* questions is to run small isolated tests and see how the system reacts.

This was tested on my MacbookPro using the DataStax Community 1.2.6 tarball distribution.


  • You can follow along as you read this and replicate the same behavior.
  • Tail the /var/log/cassandra/system.log and pay attention to what happens when you execute each step of this tutorial.
  • Use this as a guide for how to get the most out of C* documentation.

Step 1: Create some data

Reference: CQL3 music playlist example

Step 2: Look at the data directories for test1.playlists

Reference: Cassandra Writes

Step 3: Flush the data and look at data again

Reference: nodetool documentation

Step 4: Use sstable2json to look at the sstable generated

Reference: How Cassandra Stores Data
Reference: sstable2json documentation

Step 5: Delete the artist column from the row you inserted

Reference: Cassandra Deletes

Step 6: Run flush to write sstable

Step 7: Use sstable2json to look at the sstables generated

Note: The version -1 file was not touched since sstables are immutable.

Note: -2 files were created to reflect the deleted column.

Step 8: Compact the data and see what happens

Note: File versions 1 and 2 were merged to a version 3 file.

Step 9: Delete the row, flush the data and look at data again

Note: Now there are file versions -3 and -4.

Note: Contents of the new file version -4 reflects deleted row.

Step 10: Compact the data and see what happens

Note: Now we have file version -5 after we compact the sstables.

Note: Contents of the new file version -5 look like the -4 file.

Step 11: Change gc_grace_seconds so that we may remove the tombstone

Reference: Deletes

The purpose of this section is highlight gc_grace_seconds. Newer versions of Cassandra filter out range ghosts so that you won’t see tombstone records after delete (a row key with no columns).
One of my favorite blog posts is related to tombstones and data modeling: Cassandra anti-patterns: Queues and queue-like datasets

Note: After compaction all files are gone because we removed the tombstones.

WARNING: Never set gc_grace_seconds this low or else previously deleted data may reappear via repair if a node was down while tombstones are removed.


Step 12: Create an index, flush the data and look at data again

Generate another row of data in cqlsh:

Note: There are now *_idx-* in the data directory.

Look at the secondary index -Data file:

Look at the secondary index -Index file:

Look at the data file:

Step 13: Drop the Keyspace – notice we generate a snapshot and leave the directory in place

Note: if you delete a Keyspace and then recreate the same keyspace and column family, you may notice your data come back. You may want to truncate first if you really want to be squeaky clean.

Check data directory:

It’s ok to remove the directory and snapshot. Restart server and see for yourself: :)

Step 14: Change memtable_total_space_in_mb to force flushing of memtables

We will use cassandra-stress to illustrate this example since it is very easy to create a large sized column that will flush automatically.
Reference: Cassandra Operations

Change cassandra.yaml:

WARNING: Never set this value so low, it is only meant for illustration purposes.

Run stress with 1MB column size:

Note: This file is huge, so run sstablekeys to get a list of keys in the file instead of sstable2json.


I hope this was helpful for those new to Cassandra, and provided a small tour of key concepts. This was illustrated using a single row of data, but imagine how dynamic the system becomes under heavy writes that generate many sstables and triggers lots of compaction activity.