Brady Gentile: Community Manager at DataStax
Lusana Ali: Founder at Strmur
I’m here with Lusana Ali in Sydney Australia. Lusana is building an application utilizes Apache Cassandra called Strmur (Streamer). Lusana, it is great to have you on the podcast series. What exactly does Strmur do?
Strmur is a social network that basically allows you to find, share and aggregate web content through content streams; we wanted to give users the ability to filter the kinds of content they would see.
We realized from using a lot of social networks that they are very much focused on people, gossip and following others.
On Facebook, you have this feed which gets bombarded with random posts but a lot of the time you want to be able to filter what you see based on your interests. Or people might put up posts where it gets buried in my feed and they ask two days later, “Did you check this out?” for which I can only reply “no”. I then have to scroll down for five minutes before I could find the post again.
Strmur is basically trying to solve that problem, where it allows the users to subscribe to content streams based on their interests and be able to filter the content they see. For content creators, it helps to target their audiences much better.
Excellent and how are you using Cassandra at Strmur?
We use Cassandra, because it allows you to manage time series data very effectively. Cassandra automatically, upon insertion, will sort the columns by the comparator you have configured. Having that ability makes organization of the content streams very easy as items in content streams are shown in reverse chronological order.
This was one thing that was very useful to us, because we didn’t have to do any sorting and there is no pressure on the data layer to do any sorting. It makes Cassandra a really appropriate choice when you are working with time-series data.
Have you always used Cassandra or did you switched to Cassandra from another data base offering?
We initially chose MongoDB, but we decided to switch to Cassandra when we realized it was a much better fit for our requriements. We really liked that we could horizontally scale Cassandra without any disruption. Cassandra uses something called consistent hashing, which ensures that when you add nodes to the cluster, it has minimal impact on your data, so that is a huge, huge plus for us. The two things that were really important for Strmur from the CAP Theorem is that we have a system that is highly available and is also partition tolerant. Cassandra offers that, so that is good for us. We also looked at HBase but the ease of getting a cluster up and running with Cassandra was too attractive.
I know you are based in Sydney, Australia and we just started a community meet-up there, but you also had some experience with a virtual community?
I joined the Cassandra meet-up in Sydney and I am really looking forward to seeing how it goes over the next few years. You know the virtual community, I don’t really have much experience with. I am still quite relatively new to Cassandra, but so far it has been great; you can find a lot of tutorials online and documentation through DataStax.com. DataStax has provided a lot of the documentation that we’ve used. It’s definitely a very open and supportive community and it’s nice having access to shared resources. I also wanted to say a special thanks to Tyler Hobbs from Datastax for providing the PyCassa client library which is invaluable, and he has helped me with many problems through the Google group.
Is there anything that you learned while using Cassandra that maybe, in hindsight, you would have done differently?
I had a lot of trouble with the schema design aspect of Cassandra. With traditional SQL database you can, as soon as you know your entities and your fields, start creating your database schema. With Cassandra you really need to think about what kind of queries you are going to make beforehand and have a good understanding of how you can denormalize accordingly. In hindsight, I probably would have done a little bit more on the planning side of things before jumping on and starting out my schemas.
Lusana, thank you so much for joining us today and I wish you the best of luck with Strmur.
Check out Lusana’s related blog posting Cassandra ORM for Django