Hitendra Pratap Singh, Cassandra Software Engineer at Recruiting.com
"We investigated several NoSQL solutions including Redis, MongoDB and Cassandra. We landed on Cassandra for it’s great track record of scalability, performance, reliability and availability of support, as well as it’s ease of integration with our API."
Hitendra Pratap Singh, Cassandra Software Engineer at Recruiting.com

Recruiting.com provides next generation software and technology to help organizations succeed by recruiting and hiring the right people. Our technology solutions enable employers to find and effectively recruit talent through our cloud-based candidate management software and network of leading local, diversity, and niche job boards, including Jobing.com.

My role at Recruiting.com is Software Engineer, working on Cassandra based features starting from designing, development, testing, support and maintenance.

From SQL to Apache Cassandra

Recruiting.com is using Apache Cassandra to achieve real-time, high-throughput applications in our Candidate Relationship Manager. Some of the use cases include tracking millions of events per day, which are computed into client analytics and messaging systems to our clients.

We knew before we made our decision to go with Apache Cassandra that we needed a highly-scalable solution to deliver real-time analytics and messaging to our clients. Our SQL server wasn’t keeping up with our growing demands of tracking, and we found ourselves architecting around areas where SQL isn’t great. We investigated several NoSQL solutions including: Redis, MongoDB and Cassandra. We landed on Apache Cassandra for it’s great track record of scalability, performance, reliability and availability of support, as well as it’s ease of integration with our API.

We have a 6-node cluster in our own data center and the Apache Cassandra version we are using is 1.0.9; we’re in the process of upgrading it to 1.2.10 and an 8 node cluster.

Monitoring Apache Cassandra with SPM

We started using SPM Performance Monitoring and Reporting from Sematext for Apache Solr and were impressed with the amount of real-time stats we could analyze using SPM. We expected the same amount of details for Cassandra as well and decided to go with SPM.  Some of the benefits we’ve seen from SPM include the alert notification system, graphical interface [i.e. easy to analyze], detailed stats related to JVM, and creation of our own custom metrics.

We also utilize SPM for monitoring our deployments of Apache Solr and Memcached servers.

On the “overview” screen found below, you can check out some Cassandra metrics, as well as various OS metrics. Specific Cassandra metrics can be drilled down by clicking on one of the tabs along the left side; these metrics include: Compactions, Bloom Filter (space used, false positives ratio), Write Requests (rate, count, latency), Pending Read Operations (read requests, read repair tasks, compactions), and more.

cassandra-app
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

(Click to Enlarge)
Conclusion

The advice we have for new Apache Cassandra users is to pay extra attention to schema design, by imagining all possible ways that data will be queried. It is much more involved to change Cassandra schemas once you have live data.

Additionally, you may be disappointed with Apache Cassandra if you think it’s a solution for all your database needs; if you are looking to solve specific problems with scalability, reliability, performance, time series data, and data access speed (especially if you have lot of writes) then Apache Cassandra is the perfect NoSQL database.  Additionally, Cassandra has a great community.

LinkedIn
  • Alain Rodriguez

    About the 1.0.9 -> 1.2.10, take care to go through all needed steps (1.0.last, 1.1.last, 1.2.10), or to bring the whole cluster down… Also, why not 1.2.19 or even 2.0.12 ? Those versions fix a lot of issues and brings also very cool improvement. I remember us having a big outage due to a bug fixed in 1.2.12 for example…

Follow @twitter