Ben Knear Software Engineer at AddThis
"Cassandra gives us a very high percentage of reliability. That’s very important to us and let's us provide the best quality tools for our AddThis users."
Ben Knear Software Engineer at AddThis

One of the goals of AddThis is making the open web more personal. That involves using different tools that we offer for publishers on their sites, that they can put out to drive engagements. Whether that’s helping users stay involved by finding new content, or getting connected with social services like Twitter, Facebook, and whichever other platform it may be. AddThis really just helps keep people more connected with that site.

I use it on my own personal blog; it looks great.  As an end user as well, it’s so seamless and easy for me to use. I’ve seen from my AddThis Dashboard how those tools have brought people to my site and kept them around.  I think it’s funny whenever I would search for AddThis I would see “sharing tools”.  One of our coolest capabilities is recommended content. Though my site may not know you, and I don’t have the infrastructure to do any of that decisioning, AddThis can provide that for my site based on analyzing the content that I have and knowing what you may be interested in.

A database to match

I’ve been with AddThis for over two years now and we’ve been using Cassandra long prior to that.

What we really needed was extremely strong reliability across multiple data centers and fast transactions to be able to retrieve the data really quickly. Additionally, the amount of data was going to be variable – we didn’t know if there was going to be small, or large pieces, but we did know that there were going to be very discrete pieces. There weren’t a lot of foreign key type things where we needed to be concerned.

That is why, and where, Cassandra fits in so perfectly. Personally, I’d done a very small amount of work with it before starting onto it. It all made it really easy to spin up our application to run across two data centers, different boxes, different localities. Cassandra really fit into the mold we needed and has worked out great every since. 

For our team’s particular use case we are storing very specific pieces of data that deal with some modeling type things, and some analytics. There are other pieces within the company that use it in different ways, too, to help support the many internal and external AddThis applications.

We use Dropwizard, which helps create a self-contained deployment package. There is a lot of buzz right now about containers. In a way I see Dropwizard doing that; it is self-contained and reaches out where it needs to based on its configuration. We’re able to have our configuration be changed on the fly depending upon where we are deploying our instances and can do rolling deployments.

For example, when one instance has an issue, one node in a cluster we can take that one down and restart it with something different, switch traffic between the nodes, which helps keep us having 100% availability.

Coming from a relational mindset

I came from several years of experience using relational databases. One important thing to plan for when switching to Big Data solutions from relational databases is the loss of relational modeling. You have to be very intentional and methodical about how you’re architecting the data model so you create indices exactly where you need them.

I actually had experience using HBase and Accumulo, Scanner based tools. The first library I tried that was Astyanax by Netflix. I quickly ran into little issues that even though I built a whole application using it, I had problems with compound keys and adjusting the table definition once the table was built.

That’s where I moved over to using DataStax libraries. It was really easy to get started. Building a little single-load instance of Cassandra running and placing a whole application on top of it.

It was interesting to learn more about the data modeling within Cassandra, which is quite different from doing MySQL. Again, you need to know how you want to pull this data, how you want to pull things out throughout the application.

In using the DataStax libraries, the documentation on there is really good. Every time I would search for CQL Shell material, I would always get taken to DataStax, which is great. The documentation is always clean and clear.

Why they Love Cassandra

Whenever we were thinking about what type of data store we wanted, the reliability and the fast throughput that we can get from Cassandra was big. Knowing the amount of requests we get in a day; even when we get 10,000 requests a second. They are coming through and Cassandra gives us a very high percentage of reliability. That’s very important to us and let’s us provide the best quality tools for our AddThis users.

Internally, because we have the infrastructure we just piggyback on that, which is always nice. Where we might have issues with MySQL keeping things in sync between multiple clusters using a slave and a master, with Cassandra we can have multiple endpoints using the same store of data and across multiple clusters. That’s been our big proponent for using Cassandra.

Follow @twitter