This article is one in a series of quick-hit interviews with companies using Apache Cassandra and/or DataStax Enterprise (DSE) for key parts of their business. For this interview, we talked with Shawn Smith who is a software engineer fellow for Bazaarvoice.
Shawn, thanks for talking with us today. Can you give us a brief overview of what Bazaarvoice is all about?
Bazaarvoice collects user generated content that deal with reviews, questions/answers, stories and other such things for various retailers and brands, and then we analyze and serve that information back up to our clients. We’ve been doing this for about seven years now, and our customers include very well known companies such as those shown on our website.
What’s your IT infrastructure look like?
We use a wide variety of technologies for our platform, and for the database side of the things we have used MySQL and Solr and are beginning to migrate much of that to Cassandra and ElasticSearch.
Is all of that managed on premise or in the cloud?
We do have a data center in Dallas, but nearly all of our new development is being carried out on Amazon.
And do you make use of either multiple data centers or cloud availability zones?
Absolutely. Our databases span multiple cloud availability zones.
What brought you to Cassandra?
We started out by using MySQL in the classic master/slave horizontal scale out way. We found it just impossible to scale and grow write capacity with MySQL. So we started looking for something that was cloud friendly, which was very important to us. We needed something where, if any one machine goes down, it’s not a big deal, meaning our systems aren’t affected and that it can recover without human intervention.
Next, we needed a database that allowed for easy capacity expansion (especially write capacity) by simply adding new machines online. Having multiple data center support was also a very big deal, especially where we can write to multiple data centers at the same time. We didn’t want master/slave data centers but peer to peer data centers.
These things were key to why we chose Cassandra.
Did you evaluate Cassandra against any other NoSQL databases like MongoDB or HBase?
We experimented with Mongo and we do use Hadoop for our analytics, but when we did our architecture comparisons with HBase and Mongo, and wrote a number of development prototypes, we became convinced that Cassandra was the right way to go.
The multi-data center support, the masterless architecture, ease of administration, and the no single point of failure and constant availability in the cloud were the things that were key for us.
So what is Cassandra use for? Do you use it alongside other RDBMS’s or other databases?
Cassandra is what we use for our primary datastore, but we’re big enough where other databases are also used for other things.
We use Cassandra for two main classes of data. One use case is that we take lots of product feeds from our customers and we maintain a big master catalog of all our customer’s products, names, categories and brands. So all of our customer metadata is maintained in Cassandra.
The second use case for Cassandra is one where we store all of the user generated content from all our customer’s sites. Whenever users submit something on a customer’s website, that’s fed into Cassandra, with feeds coming into different data centers. That’s all analyzed and then returned back to our customers.
What advice would you give to those just starting out with Cassandra?
The biggest thing to get your head around is the data model differences. You need to think about how you’re going to read the data. We pretty much do all of our writing in Cassandra and then replicate that data over to ElasticSearch for various search and read operations. So for us, getting the schema right in Cassandra was very important.
For more information about Bazaarvoice, visit: http://www.bazaarvoice.com/