February 5th, 2013

By 

 

Listen To This Podcast

 

“When we figured out that we needed a generic messaging infrastructure the first thing we did was look at off the shelf offerings for messaging, but then decided that none of them could meet all our requirements.”

-Boris Wolf , Senior Software Engineer at Comcast

Boris Wolf Senior Software Engineer at Comcast

Welcome to this Cassandra Community Podcast. I am delighted to have with me Boris Wolf of Comcast Innovation Center. 

My first question is I understand you come from a relational database background. How has it been transitioning to NoSQL?

It’s been a very smooth transition for me personally and my team for a couple of reasons:

  1. We were lucky to have some in-house expertise. We could rely on some folks who could answer questions and give us a basic idea of how it works. 
  2. My experience was that once I actually got into it I found it easier and faster to develop Cassandra applications than working with relational databases.

 

At the Comcast Silicon Valley Innovation Center there are some exciting projects focused on the future of television. Can you talk a little about what you are working on there as it relates to Cassandra?

For the past several months I have spent most of my time working on an infrastructure project called the CMB – the Comcast Message Bus. It’s a generic message queue – publish and subscribe infrastructure and we implemented it based on a Cassandra backend. 

 

What are some of the applications that are taking advantage of the Comcast Message Bus?

We have a new setup box that is entirely cloud based, the X1 setup box, that allows you to run apps on it, and one of those apps is a sporting app that allows you get more detailed background information in real-time about live sporting events, so we use this message bus as a backend to the sporting app to send around notifications of live sporting events as they happen.

 

Lots of database options out there; why did you choose Cassandra for this?

When we figured out that we needed a generic messaging infrastructure the first thing we did was look at off the shelf offerings for messaging, but then decided that none of them could meet all our requirements. These were the three requirements we were looking for:

  1. Linear, or near-linear horizontal scalability
  2. Availability and robustness; what I mean by that is cross datacenter availability. Say one datacenter goes down you seamlessly fail over to another datacenter
  3. The third feature is what we call Active Active  – what I mean by that is that say you have one message queue and you send to that message queue from datacenter A and then you have clients receiving messages from the message queue in datacenter B, and also the other way around. So it needs to be able to read and write to two datacenters at the same time. 

 

These requirements are really hard to find in off the shelf offerings, so we decided to build our own. We chose Cassandra because it naturally gives us the availability and cross datacenter replication and we didn’t see that with any other solutions that we evaluated at the time.

 

How is Cassandra implemented at Comcast; what does your environment look like? 

We are quite far along with the project, and we have open-sourced it at github. But as far as users we are just rolling it out, so we have that one deployment I just talked about, the sports app. That one spans two datacenters of our own, and then has the infrastructure to run the message queue on top of the Cassandra ring. We are also experimenting with third party cloud based providers, and will look to roll it out there as we gain more users. 

Thank you, and before we go, any tips to pass along to someone looking to get started with Cassandra?

If you are coming from the relational database world like I did. If you are completely new to the NoSQL world then it is important to understand how to design your data model in Cassandra. For example having a denormalized data model that contains a lot of redundant data might not be a bad thing, in fact it might be a good thing. 

Then you need to understand how Cassandra works internally because that will allow you to come up with a design that works well. For instance things like tombstones, compaction or issues with wide rows. If you are aware of things like this you will be more successful designing systems that will scale well with Cassandra straight away. 

 

Thank you Boris for joining us today to talk about your Apache Cassandra use case.

For those of you interested in the Comcast Message Bus here is the Github link,

https://github.com/Comcast/cmb