Eddie Satterly: Chief Evangelist at Splunk
Matt Pfeil: Founder at DataStax
Hello, Planet Cassandra listeners. For today’s Apache Cassandra use case interview, I’m joined by Eddie Satterly, Chief Big Data Evangelist of Splunk. Eddie, how you doing today?
Doing great. Thanks.
Eddie, Splunk had a nice integration with Cassandra for a little while. Why don’t you tell us a little bit about what that’s like.
Sure. Essentially there is a Cassandra connect app that’s out there, that’s on my GitHub account that you can get access to at https://github.com/esatterly/splunk-cassandra. Essentially, it lets you do a look up for any of the data that lives in Cassandra column families or key spaces and integrate that in the UI with data that lives in Splunk. Which allows you to do interesting correlations and visualizations of the data sets. It also lets you get a picture of the key space column family configurations as well as within the column family, what columns are indexed and things for writing your queries better.
Basically, it lets you do a view between Splunk data and Cassandra data, correct?
Awesome. What originally drew you to doing this for Cassandra in the first place?
It’s pretty well known that I’m a big fan. My past experience before joining Splunk made some major headways with using Cassandra to do some really interesting things. That’s all public information out on the internet, if you search for it. Essentially, it just makes sense for a lot of our customers who are joint customers, to be able to get this data all in one place and be able to pull it together and do searches. A couple of our biggest customers are heavy users of this and have been using it in a production environment are able to pull together data sets that live in the online services part for the back end of their applications in the Cassandra side and then live on Splunk for their network devices, servers, application monitoring, all that, pulling it all into one place to get a full picture.
You mentioned that you’re a pretty big fan of Cassandra and you have been very active with Cassandra for a few years now. What drew it to you in the first place?
We had a lot of problems in my previous role that couldn’t be solved in any relational way. A number of relational databases grow to need 30 or 40 servers, just in the farm to handle read and write loads. These servers were $40,000 to $50,000 a piece because they had to serve relational database loads plus all the licensing costs. We needed to drive this cost down and allow for master anywhere while increasing scale. Scalability was reaching its limits, and had some public failures based on those relational database systems. We were trying to find a solution that would scale and address all the problems, as well as, give us the same or better functionality from a db back end perspective. Being able to do analytics against it was just a major plus.
Cassandra came in to play really as one of the three things that we tested and Cassandra outshined all of them by far so we made the selection. Going that route, we built some stuff and saw some very fast value from the system, just like we had done with Splunk while I was there as well, so for that reason I’m a fan of both.
Now you work for Splunk, so it works out. What was the most exciting feature for you in Cassandra 2.0 release?
I’ve been talking to our customers and various people in the community as well on some things that we’re doing. I think it makes a lot of sense for what’s been going on in most of these folks environments to have the vnodes configuration, basically being able to spread workloads better. Other key things are moving a lot of the data off heap, given the pains of Java in those people’s world. That stuff has really helped scalability. It’s really helped for some of the complex use cases that I’ve been working with our customers on where they are correlating the data, as well as, some of the internal prototype stuff that we play around with.