May 2nd, 2014

Two Hulu Engineers Explain Their Approach to Scalability” was created by Pete Soderling, as part of Hakka Labs’ Cassandra Week.


Matt Jurik’s talk about the architecture of Hulu’s video progress service piqued my interest in Hulu’s approach to scalability. In a Planet Cassandra interview, Matt’s teammate Andreas Rangel (Senior Software Engineer, Hulu) explains how they’ve scaled their data model. Here’s what we learned:


1. Hulu’s keyspace is a beast:

    • Primary C* cluster runs version 1.2.12


    • 16 nodes split between 2 datacenters


    • Our watch history keyspace contains several billion CQL3 rows with approximately 1TB of data per datacenter


    • Individual nodes are 12-core machines with 48GB RAM using multiple SSDs in RAID5 configuration


2. At any given time, there are hundreds of engineers logged-in to the #Cassandra IRC channel


3. Now’s the time to start learning about Cassandra’s internals:

“…Having a high-level understanding of some of the internals such has how deletions are implemented, how secondary indices operate, and when to use the row cache can go a long way in designing a strong application built atop Cassandra.” – Andreas Rangel


Check out these #CassandraWeek resources to start learning: