September 12th, 2013

With the release of Apache Cassandra 2.0, Jonathan Ellis (@Spyced), Apache Cassandra Chair and DataStax Co-Founder, looks back at the past five years of progress after Cassandra’s release as open source. Here is the original Cassandra paper from LADIS 2009 with the new features and improvements that have been added since.



Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different data centers). At this scale, small and large components fail continuously. The way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. While in many ways Cassandra resembles a database and shares many design and implementation strategies therewith, Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format. Cassandra system was designed to run on cheap commodity hardware and handle high write throughput while not sacrificing read efficiency.


Visit this link to view the entire paper: