July 10th, 2013


Recently, Eric Johnson released a guide to setting up a Cassandra cluster on Google Compute Engine. Cassandra is a NoSQL database that is designed around distributed principles. By distributing data across multiple nodes, your cluster becomes resilient to individual node failure, and scaling up your cluster is as trivial as adding new nodes.


The guide walks you through creating your nodes (instances), setting up Java, and creating and configuring a firewall. Included in the guide are several scripts that make the configuration and setup easy to understand and execute. Once you are finished with your cluster, a simple call to a teardown script cleans up your project’s environment.


Depending on your system requirements, you will want to make some adjustments to the setup in the guide. For instance, you should modify the global configuration file to meet the needs of your system. Cassandra runs best with plenty of CPU and memory, so you will likely want to choose one of our higher power instance types. You should also adjust the number of overall nodes and nodes per zone to match your requirements. Lastly, best practices highly recommend that you use persistent disks for your cluster.


Many of the core features of Google Compute Engine match up well with the requirements of a distributed database like Cassandra. Distributing instances across zones protects against individual node and zone failures. Using the metadata server means that your nodes can configure themselves, and a change to the configuration file can propagate easily to existing nodes. Consistent, fast disk I/O means that you can rely upon quick queries and reliable write throughput.


For more information about Cassandra, to download, or to contribute, visit the database’s site. Hear from experts about different approaches to distributed databases by watching the Google I/O industry panel.


This article was written by: Julia Ferraioli, Developer Advocate.

The original article can be found here.