Bad Juju Games is best known for our technology suite that we provide to video game developers, which collects a significant amount of data from video games, ingests it, and provides it back to developers for use in-game or on their websites. You may have seen our work floating around – sites like the World Tekken Federation for Namco (case study), or Call of Duty: Elite for Activision. I’m the co-Founder / CTO.
Our main product GOOP (Gaming Optimized Online Platform) provides a wide range of cross-platform gaming features; everything from: cloud-saving player profiles, thousands of permutations of leaderboards, very granular real-time usage analytics, to wagering and tournament systems for games on iOS, Android, PC, Xbox, Playstation, Wii, etc. Cassandra is our primary datastore for almost all data flowing into the system, storing and aggregating thousands of data points for millions of players.
We actually started investigating Cassandra around early 2010, with our first production launch using Cassandra 0.6 to support Namco Bandai’s title “Ace Combat: Assault Horizon”. At that time, we investigated a number of tools, including sharded Redis implementations (which we continue to use in very specific situations), MongoDB (which we ended up abandoning due to data durability concerns, some of which have been addressed, but were REALLY significant in 2010), Riak, Cassandra, and of course, balanced all of the options against our existing partitioned, replicated MySQL. While we liked Riak as well, we eventually selected Cassandra due to community support and involvement, and the rapid pace with which it was adding features.
The bottom line is we were looking for a solution that scaled well – we were already using dozens of MySQL servers doing tens of thousands of queries per second each, with many terabytes of data, and we were looking at our upcoming growth curve, and we wanted an option that would handle that level of write load gracefully. We had been building fairly complex replicated and sharded RDBMS clusters for some time, and we were looking for a solution that would grow with us, yet handle significantly more write bandwidth and better tolerate single node failures. Cassandra was fairly young at that time, but it fit our needs quite well.
As previously mentioned, our very first production deployment was Cassandra 0.6, supporting Namco’s ACAH title. That cluster ended up at approximately 10 nodes supporting over a million Xbox360 and PS3 players, with about 600GB of data per node. Our typical Xbox/Playstation titles tend to fall in that range (12-24 dedicated nodes, 500GB+ data per node, using AWS i2 instances).
Since then we’ve had clusters on 1.0, 1.2, and in early 2014, we announced our first version of the API available to the public, which is also our first production 2.0 cluster (currently 2.0.10), and our first cluster to use CQL tables rather than the classic thrift columnfamilies.
The two biggest factors and benefits of Cassandra for us are being able to better handle node failures without impacting the games, and simply being able to handle very high write volume without having to overthink the database layer. Cassandra frees us from a world of slave-promotion scenarios and intricate manual partitioning / rebalancing that we had to consider in the RDBMS world. Beyond that, we really take advantage of cross-datacenter replication (it’s not uncommon for us to have clusters that live both at AWS and in our on-site DC).
The community support is incredible. #Cassandra on Freenode is fairly active, and most people there tend to be more than willing to help out (it’s also quite helpful that DataStax has a few guys idling in there that can chime in for clarification when needed). The Cassandra meetups and Summits are active with a wide mix of experience levels, and it’s great that even the larger companies using Cassandra tend to have engineers willing to share their experiences with newcomers as they start using Cassandra.
In both developing for Cassandra, and operating Cassandra the concepts aren’t incredibly difficult, but they can be significantly different than coming from a relational database world. Don’t rely simply on docs and intuition; read about it, and then ask someone about it. There are features that encourage poor design choices (like secondary indexes), features that have limitations you should understand (counters, which we use extensively, but have limitations you should know about) and a lot of operational considerations where others have valuable experience to share that will save you headache later (it seemed like there were a few talks at Cassandra Summit 2014 where ops teams realized that running multi-datacenter over a VPN ended up being as easy as they expected due to bandwidth usage). Beyond that, be sure to track as many metrics as you can, and keep an eye on them over time to make sure everything’s healthy as your cluster grows.
If you’re looking at various “NoSQL” options, take your time. It’s a fun time to be writing all this distributed “big data” backed software, because Cassandra / Riak / et al enable a lot of things that were REALLY difficult before. But take the time to do the research; don’t just choose the one that’s easiest to setup, or the one that has the easiest-to-use interface for your programming language of choice – make sure whatever you choose is going to serve you well in production, not just in development.
Thanks for giving me an opportunity to talk about how we use Cassandra!