October 9th, 2013

By 

 

Ankur Singla: Founder at Contrail Systems a Juniper Networks company

Raj Reddy: Distinguished Engineer at Juniper Networks

Matt Pfeil: Founder at DataStax

 

TL;DR: Contrail Systems is a network virtualization and network function virtualization platform, which was recently open-sourced into, as an Apache v2 licensed project called OpenContrail.org.

 

Basically we have a couple of tough problems to solve. The first one was having high availability of the configuration system. There are four layers in the overlay platform. For configuration and analytics, we wanted much bigger scale for our database, so there were quite a fewchoices in the marketplace, like MongoDB, Cassandra, or even SQL-like databases like MySQL.

 

Because of the scale of the data obviously, the choice that we had was a NoSQL database. Also since most of our data is time series based – Cassandra was an ideal choice. Basically we created the relevant column families which suited our needs, and are storing all the data. That is all part of the collection and storage into Cassandra.

 

Hello everyone. Welcome to today’s Apache Cassandra use case interview. This is Matt Pfeil, and today I’m joined by Ankur and Raj of Contrail Systems. Why don’t you start things off and tell everyone what you both do as well as what your company does?

Ankur: Thank you for taking the time. My name is Ankur, I used to be the CEO of Contrail Systems, which was acquired by Juniper. We are responsible for building a network virtualization and network function virtualization platform, which we recently open-sourced as an Apache v2 licensed project called OpenContrail.org.

 

Raj: My name is Raj Reddy. I am part of the Contrail team that looks at the analytics portion for our SDN controller.

 

So we have one open-source company talking with another open-source group. For everyone who is not up to date on what software-defined networking is, can you explain it to them and give them an example?

Ankur:  Basically, our goal is to bring the network edge into the server, so if you look at what people have been doing in the cloud or in the data center world is at the edge of the datacenter. If you take all the state of the tenants of the applications into the network, the critical information itself, it becomes very complicated. It makes it very hard for you to automate it to and to deploy in your applications because all of the systems have to interoperate with the orchestration system.

 

Now, what we said is, it would be great if we can move the edge of the network into the server so that it actually seamlessly dives into the orchestration system. That’s what we did. We actually built an entire networking stack along with the networking services like firewall and load balancing into the server edge itself. Which makes it become a lot easier to deploy and manage new applications to move the applications around and to build a large-scale data center.

 

It sounds really complicated. It sounds like a lot of things going on. How does Cassandra fit into this?

Ankur:  Basically we have a couple of tough problems to solve. The first one was having high availability of the configuration system. There are four layers in the overlay platform. The first one is the virtual routers that sits in the server itself. This talks to a control plane node. Now, over there we want it always on, always available system. So we built an in-memory database using border gateway protocol and the whole goal of the control plane node is to be always available so it’s architected using a slightly different set of technologies than Cassandra. For configuration and analytics, we wanted much bigger scale for our database, so there were quite a few choices in the marketplace, like MongoDB, Cassandra, or even SQL-like databases like MySQL.

 

The great thing about Cassandra was that it was in line with the license scheme that we chose, Apache v2. It was a real one. It was tested. It was used widely in the marketplace, so we decided to use Cassandra.

 

The ASF feed to license is extremely friendly for everyone who is not familiar with the open source licenses, the Apache v2 license basically says that the user can do whatever they want with the software with very little restrictions. It’s not GPL-based, where you have to contribute code back that modifies it. You could do whatever you want, and it’s extremely friendly for the end user.

Ankur:  That’s precisely correct, and that’s one of the reasons why we said our code base is also Apache v2. It’s very friendly from a user’s point of view, and whatever we use really needs to line up to that philosophy, so Cassandra made a lot of sense.

 

That’s great. You’ve done some work with doing things on top of Cassandra. Can you elaborate on some of the things you’ve built to make it more accessible?

Raj:  This is a pretty complex data center system that we’re trying to orchestrate and manage. We had two main goals from the outset, the ability to easily debug problems in such a complex system and ability to provide analytics on the network traffic flows. In a large data center, there will be thousands of applications using hundreds of thousands of virtual machines, and they are all talking with each of them, they will create lots of data with respect to system logs and traces and network traffic data. Because of the scale of the data obviously, the choice that we had was a NoSQL database. Also since most of our data is time series based – Cassandra was an ideal choice. Basically we created the relevant column families which suited our needs, and are storing all the data. That is all part of the collection and storage into Cassandra.

 

Now, you asked what we built on top of Cassandra.  Once we have this data available, we wanted people to understand, and people to be able to easily access this data. As we know, the SQL has been there for a long time. What we tried to do is hide the complexity of the NoSQL tables, the Cassandra tables itself, but provide an interface, an SQL-like interface, to access this complex data. That is the value that we have provided. The language definition and details are provided at OpenContrail.org.

 

That’s awesome.  Gentlemen, it sounds like you’ve done a lot with Cassandra, and I guess my last question will be what’s your favorite part about interacting with Cassandra, the community so far?

Ankur:  The fact that we could come to DataStax and have conversations with DataStax on getting support if we were to run into trouble, was basically the great thing, that there is an open source community as well as commercial entities that can support this thing, if we were to run into trouble because running Cassandra is not core to our business, but providing a service on top of it is core to our business. We don’t want to become experts in Cassandra, but rather rely on companies like DataStax to provide that service.  Additionally, the documentation, presentations, use cases by other users was widely available – hence the learning curve was much easier for Cassandra.

Vote on Hacker News