Noah Gibbs: Lead Rubyist at OnLive
TL;DR: OnLive allows running desktop applications remotely — especially games! Control information like your screen taps are sent back to us, and compressed, low-latency video is sent back to you. Providing the only way to play Crysis on your Android tablet, for instance.
OnLive needed a system to record user actions within their interface (“the overlay”). The team originally planned on a Kafka/Cassandra hybrid system, with Kafka for data pipelining and multiplexing and Cassandra as a back-end storage system. But Kafka didn’t meet their replication and failover requirements.
In the end, Cassandra support the entire system as it allowed the OnLive team to record the actions with less complexity and greater compatibility. OnLive is using Cassandra primarily for replication, synchronization and failover while gearing up to move more of our bulk data to it. Currently deploying two clusters, of three nodes each, in two data centers, plus an additional utility node in each running Cassandra-related utility applications.
What does Onlive do and what is your role there?
OnLive allows running desktop applications remotely — especially games! Control information like your screen taps are sent back to us, and compressed, low-latency video is sent back to you. We’re the only way to play Crysis on your Android tablet, for instance. Despite sounding pretty sci-fi, it works quite well!
I’m the head of the (very small) analytics group, and my title is “Lead Rubyist.” Primarily, I’m adapting a neglected legacy analytics system to newer tools, and in general getting it into better shape in every way. Cassandra is a significant part of this process.
How are you using Apache Cassandra (what version of Cassandra)?
We’ve had a few Cassandra prototypes for earlier efforts. But our production Cassandra system is an analytics back-end system based on the DataStax Community Edition of Cassandra 2.0 — recently upgraded from 1.1.
At this point, we use Cassandra to record user actions within our interface (“the overlay”). So when the user is interacting with an OnLive interface, as opposed to a game from a regular game publisher, we’re recording user actions to Cassandra.
What was the motivation for using Cassandra and what other technologies was it evaluated against?
We originally planned on a Kafka/Cassandra hybrid system, with Kafka for data pipelining and multiplexing and Cassandra as a back-end storage system. But Kafka didn’t meet our replication and failover requirements. Now that system is all Cassandra, though we have a number of different and/or legacy systems that use other technologies.
For more information you can look at our blog post,
Can you share some insight on what your deployment looks like?
We’re small right now — two clusters of three nodes each in two datacenters, plus an additional utility node in each running Cassandra-related utility applications. The data size is currently quite small. We’re using Cassandra primarily for replication, synchronization and failover while gearing up to move more of our bulk data to it. So for us it’s about regions and reliability, not volume or insert speed.
It’s not clear how large the data in Cassandra will be, even in the indefinite future. Our largest data is very time-dependent, and can often be archived to slower, cheaper storage after a delay.
What advice do you have for those just getting started with Cassandra?
A lot of older information is about the Thrift API. Where possible, try to switch to CQL3 and binary protocol. I’ve used both significantly and I’m very impressed with the new stuff.
You can find a lot of great videos from previous Cassandra summits. While it’s hard to tell which are the best talks, you should definitely look around. The odds that somebody has talked on a topic of interest to you are very high.
What’s your experience with the Apache Cassandra community?
I’ve had great luck with the community, first at Ooyala and then here at OnLive. They’re helpful folks and generally happy to answer questions. I’ve also tended to find them at Cassandra Summits in several cases.
Anything else that you’d like to add?
Cassandra is a small, clean, focused codebase, especially if you’re used to Big Data monstrosities like the Hadoop code. When you have a specific question, don’t be afraid to dive in and check directly!
Our team also recently released a Cassandra migration tool based on CQL and Erb. It accepts CQL migrations, or CQL preprocessed with Ruby (Erb) for configurable migrations — useful for things like replication factor which may vary between development and production environments.
If interested, anyone can check out our github: http://github.com/onlive/cassandra_migrate