May 3rd, 2013

With compact storage one can create a column family. The meta data was fairly straight forward:

First, you had a comparator that controlled the sorting of columns. Column sort by the ‘name’ field. The You have a default_validator which is the type for the value. For example, for example if your comparator was LongType and your default validator was UTF8Type, it meant your columns were Longs and your values were strings.

Additionally you can also supply specific validators like in the above example.

These meant "If the column was named full_name its type is UTF8Type, and If the column was named fav_number it’s type is LongType". This is easily the best part about cassandra that I do not have to know my column names ahead of time.

CQL3’s sparse storage takes a different approach. It has no default validator. Every column must be named. Thus every column must be typed.

id uuid PRIMARY KEY,
title text,
album text,
artist text,
data blob

With CQL3’s sparse storage you can do something like unknown columns like this.

We can view the from the CLI to see how this lays out onto disk.

Both approaches have some shortcomings. Here is a scenario that causes them both headaches.

All columns from a-b (a1,a2, aaaaaaaaaa…) are integers
All columns from c-g (c1,f190,…) are varchar

We do this quite often. Where a single row key supports static columns, and multiple sets of (possibly wide) dynamic columns. IE password is utf8, age is integer, columns named friends[0] to friends~ are a set of your friends, columns named likes to likes~ are a set of your likes. This is an alternative to creating separate column families for each of these relations.

I am experimenting with a feature called Ranged Assume. The concept is it provides support for the scenario described above.

This can be used in conjunction with standard validators (columns are utf8, columns named x have values of long).