Yelp’s infrastructure for Cassandra had been deployed on AWS EC2 and ASG (Autoscaling Group) for several years. Yelp’s platform as a service (PaaSTA) had been transitioning from Mesos to Kubernetes, providing an opportunity to explore the orchestration of Cassandra on Kubernetes.
The team’s objectives were to migrate Yelp’s Cassandra clusters from EC2-based management to Kubernetes-based orchestration and leverage the Kubernetes operator for Cassandra to capture database and infrastructure-specific requirements and operationalize learnings. There was a requirement to maintain the integrity and performance of the clusters during the migration process.
Solution and Results
Yelp developed a custom Cassandra operator that runs on Kubernetes and Yelp PaaSTA, with one operator per production region. The operator manages Cassandra clusters through Custom Resources and Statefulsets. Key responsibilities of the Cassandra operator include reconciling runtime state with cluster specification, scaling clusters safely, lifecycle management, coordinating with operators in other regions, and managing TLS credentials.
Yelp successfully migrated most of its production Cassandra clusters to Kubernetes, with the remaining migrations in progress. The transition to Kubernetes-based orchestration has reduced operational toil and improved the overall management of Cassandra clusters at Yelp. The custom Cassandra operator has enabled efficient and safe management of the clusters, ensuring they meet the desired specifications and requirements.
Future Plans
The team had plans to explore zero-downtime migration strategies, anti-affinity for availability, CI/CD pipeline for Cassandra deployments, and fleet management with Clusterman. They also hoped to continue refining and enhancing the Cassandra operator for better management and performance of the clusters.