June 5th, 2013

By 

 

 

Rick Branson: Infrastructure Software Engineer at Instagram

Matt Pfeil: Co-Founder at DataStax

 

Matt: Hi my name is Matt Pfeil and I’m here with Rick Branson, Infrastructure Software Engineer at Instagram. Rick how are you doing today?

 

Rick: I’m good Matt, how are you?

 

Matt: I’m doing well, better now that I’m hearing your voice.

 

Rick: Well it’s always a pleasure.

 

Matt: Always. For everyone who doesn’t know, Rick and I have a long history where we used to work at DataStax together before he left to join Instagram.  So Rick, you guys use Cassandra over there at Instagram; why don’t you tell the audience about what your Cassandra use case is?

 

Rick: Sure, we’ve been using Cassandra for 7 or 8 months now. Initially our deployment was for storing auditing information related to security and site integrity purposes. To break down that concept, it means fighting spam, finding abusive users, and other things like that. It was really a sweet spot for the Cassandra offering.  Originally, these features were conducted in Redis; the data size was just growing too rapidly and keeping it in memory was not a productive way to go. It was a really high write rate and really low read rate; this was a spot where Cassandra really pops and shines so the switch ended up being a no-brainer for us to adopt Cassandra in that area.  We started out with a 3 node cluster and that use case has grown to a 12 node cluster. That was our path for our main application backend stuff.  

 

Recently, we decided to port another use case that is much more critical. We spent time getting everyone on the team up-to-date with Cassandra, reading documentation, learning how to operate it effectively. We chose to use Cassandra for what we call the “inboxes” or the newsfeed part of our app. Basically, it’s a feed of all the activity that would be associated with a given user’s account; you can see if people like your photos, follow you, if your friends have joined Instagram,your received comments, etc. The reason we decided to move that to Cassandra was that it was previously in Redis and we were experiencing memory limitations.  

 

We’ve had a really good experience with the reliability and availability of Cassandra. It’s a much different work load: we’re running on SSDs with Cassandra version 1.2 and we’re able to get that latest version there with all of the nice bells and whistles including Vnodes, Leveled Compaction, etc. It was a very successful project and it only took us a few days to convert everything over.

 

Some details on our cluster: It’s a 12 node cluster of EC2 hi1.4xlarge instances; we store around 1.2TB of data across this cluster. At peak, we’re doing around 20,000 writes per second to that specific cluster and around 15,000 reads per second. We’ve been really impressed with how well Cassandra has been able to drop into that role. We also ended up reducing our footprint, so that’s been a really good experience for us. We learned a lot from that first implementation and we were able to apply that knowledge to our most recent implementation. Every time someone pulls up their Instagram now, they’re hitting that 12 node Cassandra cluster to pull their data from; it’s really exciting.

 

Matt: That’s awesome. So I heard you say something interesting about your use case, that you guys moved originally off of Redis onto Cassandra; that’s obviously a memory based offering. What was the motivation behind that move?

 

Rick: For the first use case mentioned above for our backend, we moved off of a Redis master/slave replication setup; it was just too costly to have that. We moved from having everything in memory, with very large instances, to just putting everything on disks; when you really don’t need to read that often, it works fine having it on disks. Implementing Cassandra cut our costs to the point where we were paying around a quarter of what we were paying before. Not only that but it also freed us to just throw data at the cluster because it was much more scalable and we could add nodes whenever needed.  Especially when you’re going from an unsharded setup to a sharded setup, it can be a pain; you basically get that for free with Cassandra, where you don’t have to go through the painful process of sharding your data.

 

The other use case, which we call the “Inbox” use case, the feed was already sharded; it was a 32 node cluster with 16 masters and 16 replicas that were fail-over replicas and, of course, we had to go through all the sharing of things. We noticed that we were running out of space on these machines and they weren’t really consuming a lot of CPU (Redis can be incredibly efficient with CPU) but obviously when you run out of memory… you run out of memory.  

 

It was just more cost effective and easy to operate a Cassandra cluster for this use case, where you don’t need the kind of in-memory level performance. Durability was a big factor as well that Redis didn’t provide effectively; I’ll be touching more about that in my presentation at Cassandra Summit 2013.

 

Matt: I believe what you’re doing right now is “teasing” your future presentation to keep people coming to see it. 

 

Rick: I learned that from you.

 

Matt: We’ll I’m glad I taught you something in life.  For everyone out there who hasn’t touched Cassandra themselves yet but is interested in getting started, what advice do you have for them?

 

Rick: Again, I’m going to be talking a lot more about our adoption strategy and things like that during my presentation at Cassandra Summit 2013.

 

I would recommend digging into the system and reading all of the Cassandra documentation, especially the stuff on the DataStax website. The best part is documentation that I’ve noticed is it has a lot of extra information about the internals; really understanding that is important. Any database or datastore you use, you’re really going to need to dig into the documentation in order to properly use it the way it’s intended. People often run into situations where they get themselves cornered by adopting a solution too quickly or incorrectly and not doing their homework.  Specifically, with a datastore, it’s really important that it’s the most stable and reliable part of your stack.

 

Matt: I like it and Rick, I want to thank you for your time today and just add to your point: Come see Rick present about how Instagram uses Cassandra on June 11th and 12th 2013 in San Francisco at Fort Mason. Also, he and I will be doing a talk on the economics of big data entitled: Deciding Dollars: It’s Actually Actuarial.  Rick, thanks again and I appreciate your time.

 

Rick: Absolutely and people can find me on the IRC channel trolling away.

 

Matt: I highly recommend that; if you do not follow Rick Branson on Twitter as well, the trolling is something to watch!