May 10th, 2013

Titan Graph Database


Matt Pfeil: Co-Founder of DataStax

Matthias Broecheler: CTO of Aurelius


Matt: My name is Matt Pfeil and I’m the co-founder of DataStax. Today I’m joined by Mathias Broecheler. Matthias is the CTO of Aurelius, the company behind the Titan Graph database. Mathias thanks for joining us today.


Matt: So Titan is a graph database. What exactly is a graph database?


Matthias: A graph database is a database that focuses on certain data sets that are graph structured. Data sets where you have the notion of adjacency in your data, so when things are near each other or associated with one another the database focuses on making those retrieval operations very efficient.  This is so you can easily traverse graph structured and network structured data sets.


Matt: Awesome. What are some common use cases that it excels at?


Matthias: The common use cases are obviously in social network analysis and social media analysis. Anything from things like graph search, which just recently pioneered into doing any kind of analytics in that space. We also see a lot of use cases in financial transaction analysis or biological network analysis; it’s a really broad spectrum across many domains but they all have one thing in common: They all look at data sets that are highly connected and highly graph structured.


Matt: That makes sense. What’s the history between Titan and Cassandra?


Matthias: Titan actually started off a couple of years ago when Cassandra was still very young; the idea behind Titan was to build the first distributed and scalable graph database. We were looking at persistent solutions that would allow us to scale beyond the single machine and that’s when Cassandra just started taking off and it looked like a really nice project to build on top of. Titan is built on top of Cassandra and all of the consistence of Titan is handled by Cassandra. It sits right on top of Cassandra and intelligently manages the token bank to smartly partition the graph across multiple machines that comprise the Cassandra cluster.


Matt: That makes sense. You gain all of the benefits of a graph database via Titan with all of the high availability and scalability characteristics of Cassandra as the data store.


Matthias: Exactly and we don’t have to worry about things like replication, backup, and snap shots because all of that stuff is handled by Cassandra. We really just focus on: “How do you distribute a graph?”, “How do you represent a graph efficiently in a big table model?”, “How do you do things like etched compression and other things that are very graph specific in order to make the database fast?” And, lastly, “How do to build intelligence index structures so that the graphs traversals, which are the core of any graph database, so that those are as fast as possible?”


Matt: That’s awesome. How would someone get started with Titan?


Matthias: Titan was open sourced a little less than a year ago and since then we’ve started growing the community. We have a GitHub page and very extensive Wiki documentation so you can go to our Wiki page and download Titan (it’s in a zip file). You just download, unzip and you can get started locally. It comes with an example so you can just jump right in and start cranking away at the database. Follow the Wiki and follow the examples; I think that’s the easiest way to get started.


Matt: Matthias I want to thank you very much for joining us today. Is there anything else that you would like to share with the Cassandra community at large?


Matthias: Keep doing what you’re doing. We’re having a great time building on top of Cassandra and basically all the new features that are coming out will eventually make their way into Titan and make Titan better.


Matt: Awesome. Thanks again.