August 9th, 2013

By 

“We needed a simple replication mode that worked out of the box, with no additional processes or tasks for replicating the data. We saw Cassandra as the best solution for that.”

-Jose Alvarez Muguerza, Big Data Architect at Globant

Jose Alvarez Muguerza Big Data Architect at Globant

 

 

Welcome to another five minute interview and I am delighted to have with me Jose Alvarez Muguerza, Big Data Architect at Globant. Jose, could you tell us a little bit about what Globant does?

Sure. First of all it’s a pleasure for me. Our slogan at Globant is “We create innovative software products that are built for global audiences”, and indeed we do. As an outsourcing company, we deal with most interesting use cases in industries such as Gaming, Data science, Digital Marketing with customers such as EA, Coca-Cola, BBVA, and also in social media with platforms such as LinkedIn. None of our projects deal with the boring tickets resolution activities, or maintenance projects; those projects are considered boring and not challenging activities by developers so we try to avoid those kinds of projects.

 

A little bit about your background, Jose; as a big data architect, what systems had you been using previously and then if you could talk about “why Cassandra”?

As a big data architect I have worked in the industry for more than a decade. Naturally I started working with relational databases such as Oracle or MS-SQL. During the latest years, because of customer requirements, we started to grow our experience in NoSQL, and about three years ago I started to research a very interesting project called Cassandra.

 

When you started your research, was there a specific pain that you were trying to solve, or was it more “Hey what’s this new thing, I want to learn about it”?

At the time we had hit a limit in MySQL and we needed a distributed database, and I had also dealt a lot with the replication issues that we have with common relational databases, and so we needed a simple replication mode that worked out of the box, with no additional processes or tasks for replicating the data. We saw Cassandra as the best solution for that; its topology is that it has no master/slave, so we’ve considered that it was the best solution to do some research about it.

 

That’s a real sweet spot for Cassandra. We see lots of users out there doing multi-data centre replication with Cassandra. Are you replicating across different data centres as well, Jose?

Yes, in fact Globant is primarily focused on delivering software consultancy services, not working on our own product. We use Cassandra to improve our customer experience so each case depends on the customer’s business. In my area at Globant we suggest technical solutions and high performance architecture to our customers. As an independent vendor our goal is to research each case in depth in order to suggest the best NoSQL and Cloud infrastructure that fits our customer needs. In more cases than you could imagine, we include Cassandra in the solution.

 

Agreed – do your research. It’s really important to understand that as we move from a relational database world, where it was really one size fits all, it’s important to pick the right tool for the right job.  Can you talk about a little bit about what a typical deployment looks like for a customer?

 For one of our customers we are using Cassandra as a mix with Aurelius Titan, the distributed graphical database.  We are starting this project, it’s not yet deployed, it has almost 10 billion points in the graphical database, and we use Cassandra as a storage layer. As you may know, Titan and Cassandra can work together, Titan can plug several kinds of storage such as HBase or Cassandra, we selected Cassandra based on the eventual consistency feature, among others. 

 

For more information about Titan, check out this Cassandra Summit 2013 video with Aurelius’ CTO, Matthias Broecheler presenting on Distributed Graph Computing with Titan and Faunus on Cassandra.

 

Tell us about something that you’ve learned about Cassandra as you got into it that would be beneficial for other people that have just started.

 

Sure. When we started researching this area, Cassandra and NoSQL, one of the issues was how to migrate the data we have, or that our customer has, from a relational database into a non-relational database. How you need to open your mind in order to avoid the rules we’ve learnt at University about the consistency and normalization. All the blogs, books, and -information or- papers you can read talk about what NoSQL is, what consistency is, what CAP theorem is, but none of them talk about how to do it. That was one of the most complicated issues that we had to tackle.

 

My advice is to really think about the queries. In our previous paradigm, in a relational database, you need to start from the database layer doing your task in order to normalize your data and then you call your queries based on how your data is distributed in your tables. Now, the paradigm is the opposite, you need to think about your queries, your reports, your business domain and how they need the data. After that, you need to create, in this case, your column families, your keys, based on how your queries will need the data. This is one of the key points that I always suggest to our customers is on any migration…

 

Yes, really spend that time upfront thinking about your data model before you get started and really break away from the way that we’ve been taught with relational databases. That’s a great piece of advice Jose. Thank you for taking the time to talk with us today and to share your insights.

 

Thanks for having me, it was a pleasure.

Vote on Hacker News