September 13th, 2013

This posting created by Nate McCall of The Last Pickle

A lot of folks have been having issues lately with the performance of insert-heavy workloads via CQL. Though batch statements are available in the new 2.0 release, we’ll describe here a method to make interoperability between Thrift and CQL3 schema more accessible.

There are a few resources floating around the internet already on how to do this in a general case (see the resources section below). However, this particular case is based on a common problem of wide row insertions for time series data. Specifically, when you define an index column along with the primary key definition, things get slightly more complicated.

The rest of this article assumes you already have some knowledge of Astyanax and CQL3.

Given the following table definition

We setup the serializers fro the row key and time series column respectively:

We use this mutation code:

And the annotated classes representing the row key and the column:

The static indexRow method above is the critical part as it correlates to the way we structured our index clause back in the table definition: PRIMARY KEY ((id, start), offset)

With this insertion, thrift will see the following composites:

The first line being our index marker column from the indexRow method mentioned. Brian’s post below has some more details on what’s going on in the general case.


A pair of posts from Brian O’Neil:

The Astyanax wiki:

And this post from a recent mail list thread about insert performance of CQL3.