August 16th, 2013

Paolo Estrella: Software Engineer at nToklo 

Christian Hasker: Editor at Planet Cassandra, A DataStax Community Service


Christian: Welcome everybody to this five minute interview. I am delighted to have with me today Paolo from nToklo. Paolo, why don’t you tell us a little bit about yourself, your background and also what nToklo does?


Paolo: Sure. My name is Paolo Estrella and I head up technology for nToklo. Previously I’d had a little bit of Cassandra and Hadoop experience but at nToklo where they are using it extensively I’ve developed a lot of knowledge around Cassandra.


At nToklo, we are in the business of recommendations. We are essentially a recommendations engine. What we do is we store lots and lots of user behavior in Cassandra which we analyze in order to produce user-based forecasts, where we try to determine what the user is likely to do next.


Christian: Recommendations are a hot topic among retailers in order to drive more revenue, so instead of the company itself having to build their own recommendation engine, they can come to nToklo plug it in to their website?


Paolo: Yes, precisely. We give them a way to provide their data. It’s either via a plugin or API. Once we have the data we can compute our forecasts. We just launched our public SaaS platform, which allows anyone to register and post their data. At the moment we are focusing on retail and specifically, but not exclusively, Magento customers. We have a plugin for Magento, which allows retailers to send data to our web servers and into our Cassandra cluster.


Christian: Brilliant and you mentioned at the beginning Paolo that you had a little bit of experience with Cassandra and Hadoop. What was your background before that?


Paolo: I was working in media and had a relational database background, working with a MySQL backend. We spent a lot of time and effort getting MySQL to scale out the way we needed it to. We had to shard data across several MySQL instances and ended up writing a bunch of libraries that did this distribution for us. With Cassandra we found that it could just give us this capability automatically.


Christian: Yours is a perfect use case for Cassandra, recommendation engines, lots of time series data. I was wondering, how was the decision made to go with Cassandra at nToklo?


Paolo: nToklo has always been focused on Cassandra from the point it went from an idea to a prototype. We started using Cassandra 0.8 and now we are on 1.2. The decision to go with Cassandra came largely out of what we had been reading about its writes performance.


Christian: Great and can you talk a little bit about what your deployment looks like. How many nodes? How much data? Are you in your own data center or in the Cloud?


Paolo: Yes, sure. We’ve got a 5-node Cassandra cluster which we deploy in our own data centres and we’ve also coupled that with Hadoop to do batch processing. We are in a single data center right now and around half a terabyte of data.


Christian: What are some of the things that you have learned along the way that would be beneficial to other people as they get started with Cassandra?


Paolo: Spend some time understanding eventual consistency, and playing around with consistency levels.


Christian: Thanks and what has your experience been like with the Cassandra community?


Paolo: In terms of community, I feel like Cassandra is a strong candidate for anyone going into it and just discovering what it can do for your application because of the community around it. We’ve found it very useful to participate in the various user groups here in London. The experiences are abundant now. I feel like with Cassandra there are so many stories out there that will help you when you get started with using Cassandra for your application.


Christian: Great and we hope to see you at the Cassandra EU Summit in London in October.


Paolo: Thanks