September 25th, 2013

This blog posting was originally created by Christopher Batey. Check out more of Christopher’s blog postings here.
Presenter: Johnny Miller
Who is he? Datastax Solutions Architect
Where? Skills matter exchange in London
I went to an “introductory” talk even though I have a lot of experience with Cassandra for a few reasons:
  • Meet other people in London that are using Cassandra
  • To discover what I don’t know about Cassandra
Here are my notes that in roughly the same order as the talk.

What’s Cassandra? The headlines

  • Been around for ~5 years – originally developed by Facebook for their inbox search
  • Distributed key store – column orientated data model
  • Tuneable consistency – per request decide how consistent you want the response to be
  • Datacenter aware with asynchronous replication
  • Designed for use as a cluster – not much value in a single node Cassandra deployment

Gossip – how nodes in a cluster learn about other nodes

  • P2P protocol for how nodes discover location and state of other nodes
  • New nodes are given seed nodes for bootstrapping – but these aren’t single points of failure as they aren’t used again

Data distribution and replication

  • Replication factor: How many nodes each piece of data is stored on
  • Each node is given a range of primary keys to look after

Partitioners – How to decide which node gets what data

  • Row keys are hashed to decide node then a replication strategy defines how to pick the other replicas

Replicas – how to select where else the data lives

  • All replicas are equally important. No difference between the node the key hashed to and the other replicas that were selected
  • Two ways to pick the other replicas:
    • Simple: Only single DC. Specify just a replication factor. Hashes the key and then walks the cluster and picks the replicas. Not very clever – all replicas could end up on the same rack
    • Network: Configure with a RF per DC. Walk the ring for each DC until it reaches a node in another rack

Snitch – how to define a data centre and a rack

  • Informs Cassandra about node topology, designates DC and Rack for every node
  • Example: Rack inferring snitch designates DC and Rack based on the IP of the node 
  • Example: Property file snitch where every node has the DC and Rack of every other node
  • Example: GossipingPropertyFileSnitch: Every node knows its own DC and Rack and tells other nodes via Gossip
  • Dynamic snitching: monitors performance of reads, this snitch wraps the other snitches to respond to network latency

Client requests

  • Connect to any client in the node – becomes the coordinator. This node knows which nodes to talk to for the request
  • Multi DC – picks a coordinator in the other data centre  to replicate data there or to get data for a read


  • Quorum = (Replication Factor/2) + 1 i.e. more than half
  • E.g R = 3, Q = 2, tolerate 1 replica going down to continue reading and writing at Quorum
  • Per request consistency – can decide certain writes are more important and require higher consistency than others
  • Example consistency levels: ANY, ONE, TWO, THREE, QUORUM, EACH_QUORUM, LOCAL_QUORUM
  • SERIAL: New in cassandra 2.0

Write requests – what happens?

  • The coordinator (node the client connects to) forwards the write to all the replicas in the local DC and designates a coordinator in the other DCs to do the same there
  • The coordinator may be a replica but does not need to be
  • For a single node writes first go to commit log (disk), then writes to meltable (memory)
  • When does the write succeed? Depends on consistency e.g a write consistency of ONE means that the data needs to be in the commit log and memtable of at least one replica

Hinted handoff – how Cassandra deals with nodes being down on write

  • Coordinator node keeps hints if one of the replicas down
  • When the node comes back up the hints are then sent to the node so it can catch up
  • Hints are kept for a finite amount of time – default is three hours

Read requests  – what happens?

  • Coordinator contacts a number of nodes depending on the consistency – once enough have responded the read can be successful 
  • Will send requests to node responding the fastest
  • If not consistent – compare timestamps + do a read repair
  • Possible other background read repair

What was missing?

Overall it was a great talk however here is some possible improvements:
  • A glossary/overview at the start? Perhaps a mapping from relational terminology to Cassandra terminology. For example the term keyspace was used a number of times before describing what it is
  • Overview of consistency when talking about eventual consistency – however this did come later? A few scenarios for when read/writes at different consistency levels would fail/succeed would have been very helpful 
  • Compaction required for an intro talk? I thought talking about compaction was a bit too much for an introductory talk as you need to understand memtables and sstables before it makes sense
  • The downsides of Cassandra: for example some forms of schema migration/change is a nightmare when you are using CQL3 + have data you need to migrate