September 27th, 2013

Donald Leonhard-MacDonald: CTO of Social Artisan

Michael Vogiatzis: Real-Time Analytics Engineer at Social Artisan

Matt Pfeil: Co-Founder at DataStax

TL;DR: Social Artisan is a platform, which provides lead identification and content analysis for brands in order to increase your engagement with customers and generate more sales. When pulling data from the web, as well as social data from external providers, they process that data using a Storm framework and use Cassandra as their primary storage.  What they’re actually doing is building a graph of interactions between customers and the items that are being shared. Social Artisan is running in their own data center, with three machines; each machine has 24 nodes.

Hello, Planet Cassandra, this is Matt Pfeil and today I am joined by Donald Leehard McDonald and Michael Vogiatzis. Donald is the CTO of Social Artisan, and Michael is the Real-time Analytics Engineer there.  Gentleman, thanks for some time today. To start things off, can you tell all of our listeners what Social Artisan does?

Social Artisan is a platform, which provides lead identification and content analysis for brands in order to increase your engagement with customers and generate more sales.  We also provide social media analytics such as someone talking about a brand for an excessive period of time to articles, which we’ll break down by location, by gender, etc.

Awesome.  Can you give an example of a customer use case?

Yes, for instance, we tested with British Airways, and British Airways has much of it’s own content.  Since much of their own content comes from British Airways High Life and their business magazines.

What we do is actually track who’s sharing articles from these magazines online and British Airways is able to see who’s sharing their content, see which of these users are influential and intend to interact with them.  Using the information British Airways receives as a tool for social and brand marketing.

That makes sense.  You’re mapping out who influencers are for a given product and letting a company be more efficient in communicating with those influencers.

Absolutely, and we also use that to allow people to develop some more complex analysis.  In fact, a lot of people actually develop ideas of which topics are being shared most, which key words are being most helpful in SEO and it’s influence.

It sounds like you’ve got some success because of the fact that you’ve got some major brands, such as British Airways.  Out of curiosity, how are you using Cassandra?

We use Cassandra when we pull data from the web and we also get social data from external providers.  What we do is we process that data using a Storm framework and use Cassandra as our primary storage.  We have multiple instances for storing different metrics and different key formats for different time periods.  Then we need to present it to the end users.  We use Cassandra indirectly through Titan DB, which is a tool that we use to move our sytem data on a graph.  At the time that we started using TitanDB we were using Cassandra already, but likely TitanDB could not exist without previous Cassandra escalation, so we went on using without a problem.

How are you using the graph functionality of TitanDB?  Is it to traverse the customers for a given product or company?  Can you elaborate on that more?

What we’re actually doing is we’re building a graph of interactions between customers and the items that are being shared.  What we’re trying to do is build up a data graph that can build a probability of different articles with different key words being issued by different users.  We use Titan DB to help build these, using statistics that meet their interests and users who are much more likely to share this article.

Also, it helps to build up relationships between users who are more likely to communicate with each other out of sheer interest.  But it’s really nice, and actually exciting, that we could use Titan DB on top of Cassandra because we trust the two working together.

What was your motivation for using Cassandra originally and what other technologies did it either replace or get evaluated against?

We had a troubled path towards Cassandra.  We started off using MongoDB as the backend, and that was what we used for everything — so that was really a dumping ground for all our data.

MongoDB wasn’t up to scale and it didn’t work as well as we had liked.  So, we moved on to use HBase, but we found that HBase didn’t play well with our setup and required us to use Hadoop.  We only used Storm in the backend and we were actually using Nutch, as well.  We then kind of fell into looking at other solutions and we saw Cassandra.

It was a godsend to find Cassandra because it took away all the horribleness of HBase.  It was just a nice, smooth, and clean API. And actually with the new DataStax Java Driver we’re very happy because it’s just simple; this is how it should be. Working with Cassandra has been amazing.

That’s really good news to hear.  The community’s come a really long way over the years in terms of making the product easier to use while still harnessing all of the power, performance, and availability that’s in the core architecture.  Out of curiosity, what can you share about your infrastructure?  Are you guys running in the cloud?  How many machines are you running?

We’re running in a data center, with three machines.  Each machine has 24 nodes.  Obviously when Cassandra 1.2 came out with vnodes we were really over the moon, because that was exactly what we needed. We were able to leverage more of the power of our servers, so that was actually really exciting for us.  We’re really looking forward to 2.0, with all the things like triggers and compactions.  But the back end we’re really glad we choose Cassandra. Cassandra is easily handling the data work. We’re going to 2.0, but we haven’t made the transition yet.

Gentleman, I want to thank you for your time.  Is there anything else you’d like to add?

I’d like to say that the Cassandra Community was one of the reasons we continued using Cassandra.  I really like attending Cassandra meetups and meeting with other Cassandra engineers, to discuss use cases on how to improve our system.  I’m thankful for that.

I would just like to say that we heard a talk from Patrick McFadin in London and it was one thing was really amazing.  It was one of the things that continue to energize us to keep using Cassandra.