January 8th, 2013

This post was created by Stéphane Moreau on the LogikDevelopment Blog.


I recently tried out Apache Cassandra which is a NoSQL solution that was initially developed by Facebook and designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure.

In order to populate the database, I used Apache Flume and Flume NG Apache Cassandra Sink which helped me to inject logs into it. But let’s focus on Cassandra here, I will write posts about Flume later on.


This is the Cassandra schema I was using (which is the one suggested by the Cassandra sink):

01 create keyspace logs with
02    strategy_options = {datacenter1:1}
03 ;
05 use logs;
07 create column family records with
08    comparator = UTF8Type
09    and gc_grace = 86400
10 ;

After adding the data into the database, I wanted to fetch them to make sure everything went well.
I tried by three different ways:

  1. Cassandra CLI
    To return the first 100 rows (and all associated columns) from the records column family, I ran the following command: 
    LIST records;

    However, the rows were looking like:

    This behavior is clearly explained on the DataStax page Getting Started Using the Cassandra CLI:

    Cassandra stores all data internally as hex byte arrays by default. If you do not specify a default row key validation class, column comparator and column validation class when you define the column family, Cassandra CLI will expect input data for row keys, column names, and column values to be in hex format (and data will be returned in hex format).

    To pass and return data in human-readable format, you can pass a value through an encoding function. Available encodings are:
    * ascii
    * bytes
    * integer (a generic variable-length integer type)
    * lexicalUUID
    * long
    * utf8

    Which means that we need to specify the encoding in which column family data should be returned. We can do it for the entire client session using the following commands:

    ASSUME records KEYS AS utf8;
    ASSUME records COMPARATOR AS utf8;
    ASSUME records VALIDATOR AS utf8;

    So if we now run the previous command, the rows are looking like:

    Much better! ;)

  2. CQL
    In order to retrieve columns in the records column family, we can use the following SELECT command: 
    SELECT * FROM records LIMIT 1;

    However, the output looks like: