February 28th, 2013

So I recently ran into the white paper:

Using Paxos to Build a Scalable, Consistent,
      and Highly Available Datastore


Generally speaking when I hear about a scalable, highly available, data store I prepare myself to see some outlandish claims. None of that happened here, but there was one thing that caught my eye.  

Something is rotten in Denmark. First, without any math or benchmarking these lines should not be 20MS apart.  Your issuing a single write to a random Cassandra node (a coordinator) which launches N parallel writes to the nodes responsible for that data. Writes in Cassandra are memory bound and network bound. You could get some additional latency on QUORUM possibly 1-5ms, but 20 is highly suspect.

But other then a discrepency between 20ms and 2ms there is another problem, the problem that caught my eye in the first place, the lines should look like this:

The reason for this is that regardless or writing at quorum or one or all, in the end the write will happen on N nodes. At saturation the write stage would back up and the WEAK write performance would converge with the QUORUM write performance. If the two lines are not converging then the benchmark must not be fully loading Cassandra. (then what is it benchmarking) 

NoSQL should really have an an established NoSortium like TPH-C, that can run benchmarks on common hardware and verify results.