Andres Rangel Senior Software Engineer at Hulu
"Cassandra offers good performance, near linear scalability for our data model, and geo-replication all with minimal maintenance requirements. We evaluated HBase and Riak, but ultimately deemed that Cassandra satisfied our needs best."
Andres Rangel Senior Software Engineer at Hulu

Hulu is an online video service that offers a selection of hit TV shows, clips, movies and more on the free, ad-supported Hulu.com service, and the subscription service Hulu Plus. One of the top video streaming sites in the U.S., today they have over 5 million subscribers and approximately 30 million unique viewers per month.

I am Senior Software Engineer in the core services team, and Matt Jurik is the engineer in charge of our Apache Cassandra cluster; we build scalable highly-available systems to support the website and mobile devices.

Linear scale, minimal maintenance

Cassandra offers good performance, near linear scalability for our data model, and geo-replication all with minimal maintenance requirements. We evaluated HBase and Riak, but ultimately deemed that Cassandra satisfied our needs best.

Apache Cassandra at Hulu

We are currently using Apache Cassandra for several services here at Hulu. One particular service is for storing subscriber watch history intended for real-time access by other internal services; we use Cassandra to handle persistence and multi datacenter replication. All updates are written to both a caching tier as well as directly to cassandra. This has allowed us great flexibility with our caching tier while having a reliable persistence layer to fall back on.

Our primary cluster is running version 1.2.12 and consists of 16 nodes split between 2 datacenters. Our watch history keyspace contains several billion CQL3 rows with approximately 1TB of data per datacenter. The individual nodes are 12-core machines with 48GB RAM using multiple SSDs in RAID5 configuration.

Words of wisdom

It’s important to analyze how you are going to query your data. Spending time to design your schema around your query pattern can save a lot of hassle debugging performance issues while also ensuring that you can scale easily. Additionally, having a high-level understanding of some of the internals such has how deletions are implemented, how secondary indices operate, and when to use the row cache can go a long way in designing a strong application built atop Cassandra.

The community is fantastic. The #cassandra IRC channel is a lively bunch; folks are always willing to help out and offer advice.

How Hulu Scales Services to Support 400 Million Plays: Cassandra, Redis, and SSD-Based Hardware

LinkedIn
Top Posts This Month
Upcoming Webinars
Follow @twitter