Praveen Kumar Engineering Manager at Equinix
Equinix is a leading interconnection platform and collocation provider with about 100 data centers across the globe.
I lead “Emerging Software & Platform” team, this team is also responsible for Equinix Big Data Platform along with few other innovation areas.
Real-time with Cassandra
Equinix has about 100 data centers across the globe and we monitor network and data center infrastructure for various purposes including monitoring, troubleshooting and customer billing. These are high velocity streams of time-series data which has to be processed and stored in real-time for a number of use cases. Before Cassandra we were using many use-case specific instances which were built using RDBMS and RRDs, which was not only cumbersome to maintain, it had scalability issues constraining us from on both storage as well as compute, and we could not implement a number of use cases.
We use Cassandra to store real-time streams of data from network and data center infrastructure which is mostly time series data. This data is used for a number of computations built for use cases spanning network & infrastructure monitoring, anomaly detection, billing, customer presentment among many others.
HBase vs Cassandra
We evaluated a number of technologies and 2 finalists were Hadoop eco-system including HBase and Cassandra. We developed a TCO matrix for both of these options and Cassandra won by a big margin; multi data center deployment with Cassandra is very simple, and benefits of ring topology(no single point of failure) made this a relatively easier decision.
Top reasons for choosing Cassandra
1. Flexible schema
2. Scalable data storage, multi data center deployment implies Disaster Recovery
3. Real-time and distributed indexing using Cassandra and Solr with DataStax Enterprise
We are using Cassandra 1.1 as part of DataStax Enterprise 3.0. We are in process of upgrading it to Apache Cassandra 2.0.6 as part of DataStax Enterprise 4.0.3. 2 data centers , 36 nodes
If you are starting with Cassandra, you must un-learn RDBMS to be able to get best out of Cassandra. It’s very important to understand data modelling practices to avoid pitfalls.