July 10th, 2013

 Rick Branson, Infrastructure Engineer at Instagram: “Adopt a technology by understanding what it’s best at and letting it do that first, then expand…”



Cassandra is a critical part of Instagram’s large scale site infrastructure that supports more than 100 million active users. They recently made the switch from Redis to Cassandra and this talk is a practical deep dive into data models, systems architecture, and challenges encountered during the implementation process.


The Good with Redis

  • It’s easy to prototype
    • You don’t need to worry about how fast you put data in and take data out because it’s in memory.


The Bad with Redis

  • Redis is an in-memory datastore and memory is expensive
    • If you’re storing stuff in it that you aren’t reading all the time, it falls apart for those use cases
  • In-memory degrades poorly
    • This is a bad cliff — you will hit a wall and getting out of that hole is nearly impossible
  • Flat namespace
    • You don’t know what’s in there
  • Heap fragmentation 
  • Single Threaded


Why We Initially Chose Cassandra

  • Centralized logging with online reads
  • We have a high skew of writes to reads (1,000:1)
    • The absolute ideal use case for Cassandra
  • Ever growing data set
  • Needed durability
  • Very high availability


How It Expanded

  • Initial use case cluster was 3 nodes, now 12
  • No downtime upgrade to Cassandra 1.2
  • Adopted Cassandra for storing inbox notifications, 23K writes per second & 16K reads per second on a separate 12-node EC2 cluster
  • Logging cluster stores ~20 billion records (1.2TB)
  • Notification cluster stores ~10 billion records (550GB)
  • 99.9999% availability since we started using Cassandra


Instagram Fun Fact

~10% of transactions on Instagram are “undos”: unlikes, deleted comments, deleted pictures, etc.


To learn about about how Intagram implemented Apache Cassandra, check out Rick Branson’s presentation from Cassandra Summit 2013 and accompanying slide deck found below: