Illustration Image
Company: Instagram
Industry: Social Media and Networking
Functional Use Case: Data Store

This use case is based on an article originally published here.

** Using Apache Cassandra for Key-Value Storage at Instagram**

Instagram, one of the world’s largest social media platforms, uses Apache Cassandra extensively for key-value storage. In order to provide a reliable and responsive user experience to millions of users, they maintain a 5-9s reliability SLA, with a very tight requirement for request failure rate. The Cassandra team at Instagram noticed that read latency was a concern due to garbage collector (GC) issues, with a P99 read latency in the range of 25ms to 60ms, depending on client traffic.

To address this, the team created Rocksandra, a C++ storage engine based on RocksDB, an open-source, high-performance embedded database for key-value data. The new storage engine was designed to replace the existing LSM tree-based storage engine, which was generating a lot of overhead to JVM due to the components like memtable, compaction, read/write path, etc., creating a lot of objects in the Java heap.

Developing Rocksandra, a C++ Storage Engine Based on RocksDB

The implementation of the new storage engine on RocksDB presented three main challenges. First, Cassandra did not have a pluggable storage engine architecture, so the team defined a new storage engine API to inject the new storage engine into the related code paths inside Cassandra. Second, Cassandra supports rich data types and table schema, while RocksDB provides purely key-value interfaces, so the team defined encoding/decoding algorithms to support Cassandra’s data model within RocksDB’s data structure. Lastly, the existing streaming implementation was based on the details in the current storage engine, so the team had to decouple them from each other and make an abstraction layer.

Significant Reduction in P99 Read Latency and GC Stalls

After about a year of development and testing, the team implemented the new storage engine in several production Cassandra clusters at Instagram. The results were impressive, with the P99 read latency dropping from 60ms to 20ms, and the GC stalls dropping from 2.5% to 0.3%, which was a 10X reduction.

The team also tested Rocksandra in a public cloud environment, setting up a Cassandra cluster in an AWS environment using three i3.8 xlarge EC2 instances. They pre-loaded 250M 6KB rows into the database and configured 128 readers and 128 writers in NDBench. They tested different workloads and measured the avg/P99/P999 read/write latencies, with Rocksandra providing much lower and consistent tail read/write latency.

The Instagram team has open-sourced the Rocksandra code base and benchmark framework on Github for others to try out in their own environment. They are actively working on developing more Cassandra features support, like secondary indexes, repair, etc., and a pluggable storage engine architecture to contribute their work back to the Apache Cassandra community.

Rocksandra has improved the read latency of Apache Cassandra, making it more efficient and reliable for key-value storage. The work of the Instagram team will surely benefit the Cassandra community and improve its performance in large-scale deployments.

Original Post by Dikang Gu who is an infrastructure engineer at Instagram.

Stack Includes: RocksDB, C++, Apache Cassandra

Want to share your use case?

Planet Cassandra is the home page for the Cassandra Community, where everyone in the community can share their use cases.

Show off what you've done & help others learn following your example & contribution.

Become part of our
growing community!
Welcome to Planet Cassandra, a community for Apache Cassandra®! We're a passionate and dedicated group of users, developers, and enthusiasts who are working together to make Cassandra the best it can be. Whether you're just getting started with Cassandra or you're an experienced user, there's a place for you in our community.
A dinosaur
Planet Cassandra is a service for the Apache Cassandra® user community to share with each other. From tutorials and guides, to discussions and updates, we're here to help you get the most out of Cassandra. Connect with us and become part of our growing community today.
© 2009-2023 The Apache Software Foundation under the terms of the Apache License 2.0. Apache, the Apache feather logo, Apache Cassandra, Cassandra, and the Cassandra logo, are either registered trademarks or trademarks of The Apache Software Foundation. Sponsored by Anant Corporation and Datastax, and Developed by Anant Corporation.

Get Involved with Planet Cassandra!

We believe that the power of the Planet Cassandra community lies in the contributions of its members. Do you have content, articles, videos, or use cases you want to share with the world?