August 5th, 2013


Michel Peters: Software Developer at Trapit

Brady Gentile: Community Manager at DataStax


Brady: Hello Planet Cassandra users; we have Michel Pelletier, Director of Technology at Trapit with us today.To get things started Michael, could you tell us a little bit about what Trapit does?


Michel: Trapit is a content curation service for marketers and media companies. Our tools enable our users to discover relevant information and curate engaging content collections on behalf of their audiences across prevailing digital consumption platforms.


Basically users or administrators can create what we call traps. Traps are persistent topic channels of the most relevant content to a given user. Traps are smart, continually adjusting and improving from user feedback.  We make it really simple.  Trapit discovers the most relevant, new content for a given topic and pushes that information to your web, mobile or social audiences in near real time. 


Brady: Excellent. Would you say it’s machine learning?


Michel: Yeah there’s definitely machine learning in there. We also have a more advanced set of tools available within an application we call Content Curation Center.  Content Curation Center is for companies and organizations that want to put their brand or their experience in the context of Internet media. They can use our tools to provide feedback and set rules to generate really compelling content experiences in a highly automated fashion.  


Brady: Ah, very cool. How does Cassandra fit in to the mix there at Trapit?


Michel: We use Cassandra for what we call the queue store. As articles get pulled in to our system (we get about a million a week) we have to trickle that content down through all the different users’ traps, and we have sort of a data flow model where articles flow from upstream sources to downstream sources; they get filtered and classified as they go along the way. It means that a lot of data flows through sort of these cascading waterfalls of document queues, and we store all that data in Cassandra.


Brady: Excellent. What was your motivation for using Cassandra here, and were there any other technologies that it was evaluated against?


Michel: The original motivation I didn’t have any exposure to because that was before I was at Trapit, but we recently re-architected a lot of the systems, so we actually re-evaluated Cassandra. The old system was entirely Cassandra driven and every piece of data in our system went into Cassandra. When we re-evaluated, we really wanted to focus on Cassandra’s strong features in terms of the ability to sustain a lot of writes, the ability to replicate data around multiple shards, and the ability to survive node failure without causing any downtime. When we went and re-evaluated Cassandra, we evaluated a few of our other technologies across the board as well; even stuff like Postgres or Mongo, various NoSQL databases. We looked at Kafka I believe (which is also an Apache project).


We already had the domain knowledge with Cassandra and it was already serving this particular use case very well for us, so we stuck with it. We’re really quite happy with the performance.


Brady: That’s excellent. Can you share with us some insights on what your Cassandra deployment looks like?


Michel: Yes. We recently migrated this new architecture entirely to AWS. All of our nodes are running in EC2. We have the ability to deploy our system to multiple Amazon regions, so there are several distinct deployments, some for different customers and some for multi-tenancy where multiple customers exist in one system.


It’s kind of hard to just say there’s X number of servers but to give you an example, our biggest multi-tenancy system were replicated twelve ways on AWS with four nodes in three different availability zones. That basically means any one node at any one availability zone can go down and we’re still servicing queries out of Cassandra.


Brady: Highly available, right?


Michel: Yeah we do love that aspect of it very much.


Brady: What is your favorite part of Apache Cassandra?


Michel: That I don’t have to think about it very hard: it’s simple and deals with the replication issues by itself.  There is some thinking around deleting data; that tends to be where we have to apply a little maintenance occasionally. Otherwise, we pound it with a whole lot of writes. This particular use case for Cassandra is high write and low read. We’re inserting documents in queues all day long, but the user might come along once a day and pull in individual queue back out again. We needed something that could sustain a whole bunch of writes without us having to worry about how to architecting that ourselves.


Brady: Very good. For future versions of Apache Cassandra, what’s one thing that maybe you would want to see?


Michel: I can’t think of any one thing specifically; the current way that we use Cassandra fits very well with it. Previously when our entire system was in Cassandra we had issues with consistency, but we removed those consistency issues by putting that data in a relational database where we felt it belonged; it was a little more appropriate. Now all our consistency issues are gone. The eventual consistency that Cassandra provides works perfectly well for our queue store. I really can’t think of any one particular deficiency that I could point out. I’m sure you guys have got some great features in the pipe. There’s nothing currently burning us.


Brady: Cool. What’s your experience with the Apache Cassandra community? I know before the interview you had mentioned that you’re using a Python client driver, could you shed some light on that?


Michel: We use a driver called Pycassa, and it’s pretty straightforward. It provides some classes and methods around, I believe, the Thrift library, and this is where it’s starting to get deeper down below my expertise; we interact largely with the Python library.  We have some system admin guys that work with the command line tools. Otherwise, all our developers pretty much use the Python library to create an object that reflects a column family, and there is some simple methods for getting keys, multiple values, that kind of stuff. We work specifically around the feature set that that library provides for us, which may only be a subset of what Cassandra actually provides but it suits our case perfectly.


Brady: Great. Is there anything else that you’d like to add?


Michel: No. Like I said, we re-evaluated Cassandra, so we stuck with it, and I think that says a lot about how it serves our use case well. We have a mix of big data and small data problems, so Cassandra’s been really good for us on the big data side, definitely. We’re going to continue to use it. We don’t have any plans to migrate away from it and it works really well for us.


Brady: Awesome. Glad to hear you guys have had a good experience with Cassandra, and it looks like you’re moving forward very smoothly. I think that’s all the questions I have for you today, Michel. Thanks so much for joining us. I wish you the best of luck at Trapit.


Michel: No problem, thank you.