** Using Apache Cassandra for Key-Value Storage at Instagram**
Instagram, one of the world’s largest social media platforms, uses Apache Cassandra extensively for key-value storage. In order to provide a reliable and responsive user experience to millions of users, they maintain a 5-9s reliability SLA, with a very tight requirement for request failure rate. The Cassandra team at Instagram noticed that read latency was a concern due to garbage collector (GC) issues, with a P99 read latency in the range of 25ms to 60ms, depending on client traffic.
To address this, the team created Rocksandra, a C++ storage engine based on RocksDB, an open-source, high-performance embedded database for key-value data. The new storage engine was designed to replace the existing LSM tree-based storage engine, which was generating a lot of overhead to JVM due to the components like memtable, compaction, read/write path, etc., creating a lot of objects in the Java heap.
Developing Rocksandra, a C++ Storage Engine Based on RocksDB
The implementation of the new storage engine on RocksDB presented three main challenges. First, Cassandra did not have a pluggable storage engine architecture, so the team defined a new storage engine API to inject the new storage engine into the related code paths inside Cassandra. Second, Cassandra supports rich data types and table schema, while RocksDB provides purely key-value interfaces, so the team defined encoding/decoding algorithms to support Cassandra’s data model within RocksDB’s data structure. Lastly, the existing streaming implementation was based on the details in the current storage engine, so the team had to decouple them from each other and make an abstraction layer.
Significant Reduction in P99 Read Latency and GC Stalls
After about a year of development and testing, the team implemented the new storage engine in several production Cassandra clusters at Instagram. The results were impressive, with the P99 read latency dropping from 60ms to 20ms, and the GC stalls dropping from 2.5% to 0.3%, which was a 10X reduction.
The team also tested Rocksandra in a public cloud environment, setting up a Cassandra cluster in an AWS environment using three i3.8 xlarge EC2 instances. They pre-loaded 250M 6KB rows into the database and configured 128 readers and 128 writers in NDBench. They tested different workloads and measured the avg/P99/P999 read/write latencies, with Rocksandra providing much lower and consistent tail read/write latency.
The Instagram team has open-sourced the Rocksandra code base and benchmark framework on Github for others to try out in their own environment. They are actively working on developing more Cassandra features support, like secondary indexes, repair, etc., and a pluggable storage engine architecture to contribute their work back to the Apache Cassandra community.
Rocksandra has improved the read latency of Apache Cassandra, making it more efficient and reliable for key-value storage. The work of the Instagram team will surely benefit the Cassandra community and improve its performance in large-scale deployments.