August 1st, 2013

By 

 

Abhinav Ajgaonkar: CTO at Crowd Riff

Brady Gentile: Community Manager at DataStax

 

Brady: Today we have Abhinav Ajgaonkar, CTO of CrowdRiff. Hi Abhinav – to start things off, what does Crowdriff do?

 

Abhinav: Crowd Riff’s main product is our social scoring platform. We help brands analyze their communities by scoring and segmenting their audience on social networks like Twitter, Facebook and Instagram.

 

Brady: So, how does the Cassandra fit into the mix of what it is you’re doing at CrowdRiff?

 

Abhinav: We gather activities of users across all social networks push them into a distributed graph database called Titan. Titan supports three storage backends: Apache HBase, Oracle BerkleyDB and Apache Cassandra. We’ve chosen to go with option number 3.

 

Brady: Why did you pick Apache Cassandra and were there other technologies that you evaluated it against or that you migrated from?

 

Abhinav: We are currently in the process of moving away from MongoDB. As we were planning the current iteration of the product, we realized that a graph database would be an excellent fit for our data model. We tested Neo4j, Titan and a couple of other graph databases. With Titan, we tried both HBase and Cassandra and concluded that Cassandra was the best fit for our use case. 

 

Brady: How has the transition been for you, going from MongoDB to Cassandra?

 

Abhinav: It’s been painless. “Excellent” would be another way to describe it.

 

Brady: We are beginning to see several MongoDB to Cassandra migrations in the community. Is there any advice that you might give someone who’s trying to do their own migration from Mongo to Cassandra at the moment?

 

Abhinav: The only thing that took some getting used to was the inability to nest data as deeply as you could with Mongo. That being said, if you’re nesting four levels deep, it’s probably not a good idea and you may want to rethink your data model.  The main difference from an Ops perspective is the sheer stability of a Cassandra cluster versus other systems that have run in a “single master” setup. Another huge bonus for us was the ability to add nodes to the cluster and have the storage, throughput and replication scale linearly.

 

Brady: What does your deployment look like?  

 

Abhinav: We’ve started with a 3-node cluster each with a single 8-core Intel Haswell CPU, 32GB of RAM and 2x2TB spinning disks in RAID 1. This setup is able to serve upwards of eighteen thousand reads per second without breaking a sweat.

 

Brady: Do you have any experience with the Apache Cassandra community, the Apache mailing list or Planet Cassandra or meetup groups?

 

Abhinav: I’ll be attending the inaugural meet-up scheduled by the Toronto Cassandra Users group in the first week of August. Other than that, I haven’t had a reason to post on the mailing list, because the documentation and blog posts I’ve come across have been amazing. I haven’t had any questions that I couldn’t answer by reading the DataStax documentation.    

 

Brady: Excellent.  That’s really good to hear that the docs have been working out well for you and exciting that you’ll be at the first Toronto Cassandra Users Group meet-up.

 

Abhinav: Looking forward to it.  

 

Brady: In the future maybe CrowdRiff could come and present to the group and share their experience as well.  

 

Abhinav: Absolutely.

 

Brady: Abhinav, thanks so much for joining us and is there anything else that you’d like to add?

 

Abhinav: Initially, I’m not sure why, but I was under the impression that Cassandra would be really difficult to setup and run. Turns out, its quite the opposite. If you decide to give it a shot and dive in, I think you’ll be pleasantly surprised.

 

Brady: Good advice.  All right, thanks so much, Abhinav.