April 11th, 2014

Deploying a Twitter clone with a single command was created by James Horey.

In this post I’ll show you how to deploy Twissandra, an open-source Twitter clone using Ferry. Twissandra is a simple Django application that uses Cassandra to store tweets and user information. Normally Twissandra assumes that you already have Cassandra installed on your local machine. While installing and running Cassandra locally isn’t too difficult, it’s a lot simpler and faster to use Ferry. Plus Ferry is designed to manage multiple application stacks and supports several backend technologies including Hadoop and Open MPI. That means that when you’re ready to experiment with a Hadoop application, you’ll be able to start right away without having to learn how to configure yet another backend.

If you’re interested in watching a short video version of this post, head over here. You’ll still want to read the rest of the post since it contains important information on what’s going on. Because our demo is going to use Ferry, I’m going to assume that you have Ferry installed and working. If not, please go ahead and install it using the instructions here. While this tutorial may be useful even if you don’t have Ferry installed, it’s a lot more fun if you do.

Now to get started we’ll need to download the Twissandra application. Type these commands into your prompt:

If you were to do a ls cassandra-examples/twissandra, you would see two important files that we need. The first is the Dockerfile. This file tells Ferry how to build the Twissandra application. For more information on Dockerfiles head over here. The second file is called twissandra.yaml. This file defines the application stack required to run the application. Before examining what’s in these files, let’s start Twissandra. Type the following into your prompt:

The start command builds the Twissandra Dockerfile and provisions the backend Cassandra cluster. This process usually takes a few seconds. You’ll know it’s done when it prints out the unique ID of your application. Afterwards, you can find the IP address of your Twissandra application by typing:

This command will print out detailed information regarding your application stack. For now, we just need the internal_ip value of our connector. It should look something like this:

Paste that IP into your web browser, and you should be greeted with a Twissandra web interface.


Congratulations, you’ve now deployed a working Cassandra cluster connected to our Twitter clone!

As of right now, you can only access the website on the machine you’ve used to start the application. I plan on supporting port redirection in the near future so follow our progress on Github and Twitter.

Now that we have Twissandra up and running, let’s take a step back and examine the two files that defines our application. First if you take a look at twissandra.yaml, you’ll see something like this:

This configuration file tells Ferry to create a single-node Cassandra cluster and a single Twissandra client that connects to that cluster.

Before proceeding further, there is one thing I’d like to mention. Ferry runs everything in distributed mode. That means we could increase the instances count on our Cassandra cluster, and Ferry would happily oblige. Ferry uses Docker underneath, so the overhead of running multiple nodes is very small. Why would we want to do this? Cassandra enables developers to choose from various “partitioning” strategies (i.e., how data gets split over multiple machines). This, in turn, has a huge effect on the overall performance of your application. As part of the data modeling process, we can use Ferry to stand up multiple clusters, each with a different partitioning strategy. While we won’t do that today, this is something that I plan on writing about in the future.

Now back to our YAML file. Since we’re using a custom client, we’ll need to provide a Dockerfile that tells Ferry how to build that client. If you open the Dockerfile, you’ll see something like this:

The first line is pretty important:

It tells Ferry that this client is based off of the official Cassandra client. The Cassandra client is included with Ferry and consists of a basic Ubuntu 12.04 image with a few Cassandra-specific packages. The other lines in the Dockerfile install additional packages that our client uses, and downloads the Twissandra source code.

The last line in the Dockerfile is also important:

That line basically tells Ferry that the script twissandra.sh should be added to the image and placed in the /service/runscripts/start directory. This directory is special because it contains all the scripts that should be executed when the connector starts. Ferry supports additional “events”, including stoprestart, and test. Each of these events are defined by their respective runscript directory (/service/runscripts/stop, etc.). By placing an executable script in the relevant directory, the script will be executed when that event occurs.

If you take a look at twissandra.sh, you’ll see the following command near the bottom:

That means that when the connector starts, the first thing it will do is start the Twissandra server.

And that’s it! As you can see, the Dockerfile is fairly short and straightforward. The first time Ferry starts this application, it will build the Dockerfile and save it as an image (examples/twissandra). If you start the application again, it should go a bit faster since the image will be cached. Of course you can stop the application and start it again later. If you do, all your tweets should be saved. Once you’re done experimenting with the application, you can remove the application and everything will go away.

Because our entire application is defined by the application YAML file and the client Dockerfile, it’s super simple to make your own changes and to share those changes with others. For now, other people will still need to be able to build your Dockerfile (by using the -b flag during start), but I plan on supporting image uploads in the future via Docker registries. If this feature is important to you, let me know.