May 15th, 2014


As more and more businesses move from enterprise IT solutions to web scale cloud solutions to cater to the growing customer needs, they need to be innovative and find ways the applications and infrastructures would to scale rapidly and be highly available.

High availability is an important requirement for any online business and trying to architect around failures and expecting infrastructure to fail and even then be highly available is the key to success. One such effort here at Netflix was the Active-Active implementation where we provided region resiliency. This presentation would discuss the brief overview of the active-active implementation and how it leveraged Cassandra’s architecture in the backend to achieve its goal. It will cover our journey though A-A from Cassandra’s perspective, the data validation we did to prove the backend would work without impacting customer experience. The various problems we ran into like long repair times and gc_grace settings. Our lessons learnt and what would we do differently next time around?


About Roopa Tangirala

Engineering Manager at Netflix with over 12 years experience as Senior Cloud Data Architect, Database administrator working extensively on data modeling, performance tuning and guiding best practices of various persistence stores, be it RDBMS like Oracle or NoSQL like Cassandra.

Join the South Bay Cassandra Users >