The Problem:
BlaBlaCar, a long-distance carpooling platform, faced a challenge with their existing database system. They needed a solution that could handle a large amount of data, provide high availability, and ensure data consistency across multiple data centers. The existing system was not able to meet these requirements, leading to performance issues and potential data loss.
The Solution:
BlaBlaCar chose Apache Cassandra as their database solution. Cassandra is a highly scalable, distributed database system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. It offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.
How They Use Cassandra:
BlaBlaCar uses Cassandra in a multi-datacenter setup, with each datacenter having multiple racks. Each rack contains multiple nodes, and each node is a Raspberry Pi running Cassandra. This setup allows them to simulate real-world scenarios and test the resilience and performance of their system.
They use a Python script to interact with the Cassandra cluster. The script allows users to select the Write Consistency Level, connects to the Cassandra cluster, creates a keyspace and table if they do not exist, and writes a new entry to the table every 5 seconds. If the SELECT button is pressed for at least 5 seconds, a new Write Consistency Level can be set.
BlaBlaCar also uses the Cassandra Python driver to display which Cassandra coordinator node has been chosen for the write. This helps them understand the load balancing and data distribution across the cluster.
The setup also includes emergency stop switches to simulate power loss in a rack or a datacenter, providing a way to test the resilience of their Cassandra setup.
In summary, BlaBlaCar uses Cassandra to ensure high availability, data consistency, and scalability across multiple datacenters. The use of Cassandra has allowed them to handle large amounts of data efficiently and reliably.