February 16th, 2013

I am learning Cassandra. It is not easy.

I did some reading, including the White Paper that Sameer Farooqui mentioned: Cassandra – A Decentralized Structured Storage System by A. Lakshman and P. Malik over at Facebook. That was helpful.
I also took both our DataStax Developer Training for Cassandra, and Administrator Training for Cassandra in Redwood City, with Sequoyah Pelletier. That was also helpful. Now I understand that what we used to call a database is now a Keyspace. What we used to call a table is now a ColumnFamily (and can live on multiple network nodes), and what we used to call a row can live on any node in the cluster (but a row cannot span more than one node). Whew. A lot of new glossary terms.
The clusterizing of data (uh-oh did I make a new word?) is not new. We did it with Hadoop and before that, but this is yet another spin on clusterization.
The difficulty I was having is that all the code was written in CQL shell “cqlsh” or in python or using the Cassandra Thrift API. I am a Java guy since 1997, when I learned from Yaakov Weintraub and Ted Young at Advanced Web Technologies (now LearningPatterns), and so I yearned to see the Java code, but where is it? Nothing I was looking at in the white papers, or the examples looked familiar to me, until…
One day at our office in San Mateo, Michael Figuiere introduced himself and showed me what he was working on in France (he just relocated to California like me). He showed me his presentation on the DataStax Java Driver for Apache Cassandra, and I almost cried! Yay. Finally the Java code I was looking for. Today, I met with him again and wrote some code in preparation for including it in the new version of our developer course. 
I share some of it with you here, as I go, since this learning Cassandra (C* from here on) thing is starting to fly:
First of all, you will need the documentation now public here: http://www.datastax.com/drivers/java/apidocs/
Then, having come from the JPMorgan school of Entity Services (read Thomas Erl’s services book SOA Principles of Service Design) I wanted to create an service that would have a CassandraDAO injected into it (using Spring dependency injection). Here is a draft of what I was thinking, and again this is off the cuff in about an hour of musing with Michael:


/* All code by Laurent Weichberger laurent@datastax.com */


import com.datastax.driver.core.Session;

import com.datastax.driver.core.Cluster;

import com.datastax.driver.core.ResultSet;


/* See DOCS HERE: http://www.datastax.com/drivers/java/apidocs/ */


public class CassandraDAO {


/* Dependency Injection using a Factory Bean (some day) 

 * Or use this constructor

         * Stateless session object, it doesn’t remember anything

 * Session should be a singleton, however multiple calls to 

         * connect will provide new session objects, 

         * so developer has to maintain singleton status.

 * Session is MT and shared and connection pooling is there by default. 



private Session session = null;


public CassandraDAO(){


this(“”, “”);



public CassandraDAO(String contactPoint1, String contactPoint2){ 


                // contactPoint2 is only if contactPoint2 is down


//static factory method on Cluster class

Cluster cluster = new Cluster.builder().




/* Cluster has methods for configuration, such as RetryPolicy, 

                 * round-robin retry is the default */



/* connect is overloaded, can provide a “keyspace” String.

Otherwise, all table names must have  a keyspace.tablename */


this.session = cluster.connect(“datastax”); 


                // can be any keyspace name (database name) 




/* create & update is the same in C*. The only difference is that 

         * update can give WHERE clause, update can also target secondary index. 

         * Insert with no PK won’t work. – MF 

 * The execute() method on the session will take any CQL3 code. 

         * In fact the Driver doesn’t do anything to the CQL3 code,

 * It only passes the CQL3 to the Cassandra protocol.

 * …Sylvan (France) knows most about CQL3.*/


//There are also runtime exceptions that can be thrown by execute


public void saveOrUpdate(String cql) {


try {

/* execute returns a ResultSet for when it is a read, 

                         * but here we ignore that… */




catch(NoHostAvailableException nhae){


/* No node is able to execute the request… 

                         * For many reasons. 

                         * Timeout, etc. There are implicit failover, 

                         * retry policies, etc. */




//read can throw a QueryExceutionException which is a RuntimeException

public ResultSet findByCQL(String cql){


try {

Query cqlQuery = new SimpleStatement(cql);




//Add tracingsame as C* 1.2 cqlsh> “tracing on;”



return session.execute(cqlQuery);



catch(NoHostAvailableException nhae){


//No node is able to execute the request…




//delete (soem day, maybe) at JPMorgan we deleted nothing..



/* Object Mapping Available… some day */