March 1st, 2013

By 

If you haven’t begun using Apache Cassandra yet and you wanted a little handholding to help get you started, you’re in luck. This article will help you get your feet wet with Cassandra and show you the basics so you’ll be ready to start developing Cassandra applications in no time.

Why Cassandra?

Do you need a more flexible data model than what’s offered in the relational database world? Would you like to start with a NoSQL database you know can scale to meet any number of concurrent user connections and/or data volume size and run blazingly fast? Have you needed a database that has no single point of failure and one that can easily distribute data among multiple geographies, data centers, and the cloud? Well, that’s Cassandra.

 

Step 1 – Installing Cassandra

In this article, we’ll show you how to kick the tires of Cassandra on a single machine, but note that it’s also very easy to configure a multi-node, clustered setup, which is what allows Cassandra to really flex its muscles where scale and performance are concerned.

 

The first step is to download and install Cassandra on your target test machine. To download Cassandra, go to the downloads page at DataStax.com and select the DataStax Community Edition, which includes the most up-to-date, stable version of Cassandra, the Cassandra Query Language (CQL) interface, and a free version of DataStax OpsCenter, which is a web-based management and monitoring solution for Cassandra, and a sample Cassandra application.

 

This article will show you how to install and get going with Apache Cassandra on a Mac or Linux machine. If you’re using a Windows setup instead, see this article, which will guide you through using Cassandra on Windows.

 

For this exercise, choose the Tarball option for the version of the operating system you’re using (either Linux or Mac). You’ll want to download the Datastax Community server, which includes the CQL (Cassandra Query Language) shell and sample application. For now, don’t worry about downloading DataStax OpsCenter, as we’ll cover that in another article.

 

Once your download of Cassandra finishes, move the file to whatever directory you’d like to use for testing Cassandra. Then uncompress the file (whose name will change depending on the version you’re downloading):

 

tar -xzf dsc-cassandra-1.2.2-bin.tar.gz

Then switch to the new Cassandra bin directory and start up Cassandra:

robinsmac:dev robin$ cd dsc-cassandra-1.2.2/bin

robinsmac:bin robin$ sudo ./cassandra

robinsmac:bin robin$  INFO 14:49:57,739 Logging initialized

INFO 14:49:57,750 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_35

INFO 14:49:57,750 Heap size: 2093809664/2093809664

INFO 14:49:57,751 Classpath:

.

.

INFO 14:49:59,208 Completed flushing /var/lib/cassandra/data/system/schema_columns/system-schema_columns-ib-2-Data.db (210 bytes) for commitlog position ReplayPosition(segmentId=1362167398602, position=53130)

 

 

Step 2 – Connecting to Cassandra

Now that you have Cassandra running, the next thing to do is connect to the server and begin creating database objects. This is done with the Cassandra Query Language (CQL) utility. CQL is a very SQL-like language that lets you create objects as you’re likely used to doing in the RDBMS world.

 

The CQL utility (cqlsh) is in the same bin directory as the cassandra executable:

 

robinsmac:bin robin$ ./cqlsh

Connected to Test Cluster at localhost:9160.

[cqlsh 2.3.0 | Cassandra 1.2.2 | CQL spec 3.0.0 | Thrift protocol 19.35.0]

Use HELP for help.

cqlsh>

 

 

Step 3 – Creating a Keyspace

Cassandra has the concept of a keyspace, which is similar to a database in a RDBMS. A keyspace holds data objects and is the level where you specify options for a data partitioning and replication strategy.

 

For this brief introduction, we’ll just create a basic keyspace to hold some example data objects we’ll create:

 

cqlsh> create keyspace dev

… with replication = {‘class’:'SimpleStrategy’,'replication_factor’:1};

 

 

Step 4 – Creating Data Objects

Now that you have a keyspace created, it’s time to create a data object to store data. Because Cassandra is based on Google Bigtable, you’ll use column families /tables to store data.

Tables in Cassandra are similar to RDBMS tables, but are much more flexible and dynamic. Cassandra tables have rows like RDBMS tables, but they are a sparse column type of object, meaning that rows in a column family can have different columns depending on the data you want to store for a particular row.

 

Let’s create a base table to hold employee data:

 

cqlsh> use dev;

cqlsh:dev> create table emp (empid int primary key,

… emp_first varchar, emp_last varchar, emp_dept varchar);

cqlsh:dev>

 

The column family is named emp and contains four columns, including the employee ID, which acts as the primary key of the table. Note that a column family must have a primary key that’s used for initial query activity.

 

Step 5 – Inserting, Manipulating, and Querying Data

Let’s now go ahead and insert data into our new column family using the CQL INSERT command:

cqlsh:dev> insert into emp (empid, emp_first, emp_last, emp_dept)

… values (1,’fred’,'smith’,'eng’);

Notice how Cassandra’s CQL is literally identical to the RDBMS INSERT command. Other DML statements are as well:

cqlsh:dev> update emp set emp_dept = ‘fin’ where empid = 1;

Querying data uses the familiar SELECT statement:

cqlsh:dev> select * from emp;

empid | emp_dept | emp_first | emp_last

——+———-+———–+———-

1     |      fin |      fred |    smith

However, look what happens when you try to use a WHERE predicate and reference a non-primary key column:

cqlsh:dev> select * from emp where empid = 1;

empid | emp_dept | emp_first | emp_last

——+———-+———–+———-

1     |      fin |      fred |    smith

cqlsh:dev> select * from emp where emp_dept = ‘fin’;

Bad Request: No indexed columns present in by-columns clause with Equal operator

In Cassandra, if you want to query columns other than the primary key, you need to create a secondary index on them:

cqlsh:dev> create index idx_dept on emp(emp_dept);

cqlsh:dev> select * from emp where emp_dept = ‘fin’;

empid | emp_dept | emp_first | emp_last

——+———-+———–+———-

1     |      fin |      fred |    smith

 

Conclusion

We’ve reached the end for this short article on how to get started with Cassandra. Hopefully, you now have a basic feel for how to install, create objects, manipulate data, and query data in Cassandra.

 

Where can I go for more information?

To get a good overview of Cassandra and its architecture, read the Introduction to Apache Cassandra white paper. To learn more about CQL, as well as about setting up a multi-node Cassandra cluster, see the DataStax online documentation for Apache Cassandra 1.2. Also visit the Planet Cassandra blog for more articles, technical blog posts, videos, and more.

LinkedIn