September 3rd, 2013

By 

Gabriel Madagati: Founder and CEO of s1mbi0se technology ventures

Chrsitan Hasker: Editor at Planet Cassandra, A DataStax Community Service

 

TL;DR: S1mbi0se is a big data company focusing on collecting, processing and extracting insights from huge amounts of online and offline data. S1mbi0se is a company that creates platform services or apps that can extract insights from this data about different people.

 

S1mbi0se realized that they needed a platform which could support a huge amount of data, around hundreds of terabytes; they needed a platform that was stable enough, with a strong community to support them in any event.

 

They have two types of data: The first is unstructured data, which is any kind of data that they collect. The second type of data is structured data, which they process and, after the processing, store in Cassandra.

 

They deploy everything on AWS (Amazon Web Services) and have around 40 to 50 terabytes of data stored in the structured database.

 

 

Hello everyone today I’m joined today by Gabriel Madagati, Founder and CEO of s1mbi0se technology ventures based in São Paolo, Brazil. Let’s get started Gabriel; can you tell us a little bit about what S1mbi0se does?

Sure. First of all, thanks so much for the opportunity. S1mbi0se is a big data company focusing on collecting, processing and extracting insights from huge amounts of online and offline data.  We do this in order to understand the customers (or users) behavior, intent and interest.

 

Since we have a better understanding about the users or customers from brands, advertisers and publishers, S1mbi0se is a company that creates platform services or apps that can extract insights from this data about different people. We find out what people do, what they don’t do, what they want to do and how they want to do it. And the ways they basically use this data is to put a brand or company in contact with these people at the best moment, in the most relevant way possible. 

 

Why Cassandra at S1mbi0se? What is the use case for Cassandra?

When we started the company, we realized that we needed a platform that could support a huge amount of data; we are talking about hundreds of terabytes of data and, of course, a relational database was not the best choice. 

 

We needed a platform that was stable enough, with a strong community to support us in any event. We created some benchmarks with our other technologies. After analyzing this information based on performance, support and the cost/benefit, we decide to use Cassandra. Now we have a large cluster of Cassandra nodes storing all of our data. 

 

We have basically two types of data: The first is unstructured data, which is any kind of data that we collect; it could be social graph, other buyers CRM databases, transactional data and everything is stored on top of Cassandra. 

 

The second type of data is structured data; we process all of this data and, after the processing, we store this data in Cassandra. From there, it is also to be used by Solr.

 

I understand you decided that relational databases would not be cost effective or really designed to do what you needed to do; did you look at any other NoSQL databases alongside Cassandra? 

Sure. We looked at MongoDB, HBase, Hadoop, Riak and many others. We basically looked through most of the advertised NoSQL databases and considered many options. But again, we really put our trust in Cassandra. 

 

You mentioned a little bit about the community. Can you talk about your experience with the community?

This is the most surprising thing to me as a manager, not so much related to the technical side but from a management’s standpoint, was the feedback and support we received from the Cassandra community. The support of the community made me change my mind and, still today, S1mbi0se is really a strong supporter of the open source side of things.  Even by investing in projects that are open sourced because of the support we receive from the community. When I reached out to the community mailing list, even the committers have helped us out; especially Aaron Morton.

 

Aaron Morton is really great.

Yes he is; he’s helped us out so much and I can’t even imagine being able to get this kind of support from Oracle. Even if I were paying, I would not get this level of support from them. This is what surprised me the most: you are using something for free from the community and everybody is trying to make the product the best possible.  The entire community is filled with people who have their own life and jobs but they still support you the best possible way. It is truly amazing. 

 

Unfortunately, we don’t have strong Cassandra community in Brazil, yet. Every time we talk about our platform, we try to advertise Cassandra because the more people who know about it, the better it is for us. 

 

Earlier you mentioned total amount of data, if you could talk about how much data is being stored in in Cassandra and what the environment looks like that would be fantastic.

We deploy everything currently on AWS (Amazon Web Services). We have around 40 to 50 terabytes of data stored in the structured database. Even if it is a NoSQL, we still call it a structured database in Cassandra. We have over 40 terabytes of data and everything is stable, especially running on AWS. If, say, a new customer brings in new transactional data, we simply add new nodes and it’s really easy. We also use some other open source tools created by Netflix that help us to manage and monitor Cassandra in the best possible way.

 

Great. Is there anything that I haven’t asked you, which you would like to add?

Let’s keep on talking about big data in Brazil; it’s a very hot topic here but most of the people don’t really have a notion of what big data truly means.  It’s important that big data be used at other companies and that more people are educated on the use of open source technology, to better leverage big data and connect the dots. 

 

Unfortunately, there aren’t many resources here in Brazil at the moment. We are trying to solve this problem by offering our technology. I believe that this will create great awareness and a great opportunity for Brazilian companies to actually understand how they can leverage their data, how they can connect the dots, bring more profits, and bring more insights without spending millions in software or hardware.

 

It’s not just in Brazil, but in the States as well. Big data is a buzzword and many people don’t really understand what it means and how to unlock it. Gabriel, thank you so much; we really appreciate you taking the time today.