September 20th, 2013

Fabien Rousseau: Software Engineer at Yakaz

Christian Hasker: Editor at Planet Cassandra, A DataStax Community Service


Bonjour everybody.  I am very happy to have with me today Fabien Rousseau from Yakaz in Paris.  Fabien, why don’t you just start off by telling us what Yakaz does?

Yakaz is a local social network. We aim to connect people living in the same area to communicate easily and in real-time online. It is a free platform to discover local life, post ads or recommend stuff our users like. Over 10 million unique visitors on Yakaz worldwide every month.


Excellent. What is your role at Yakaz, Fabien?

I am a Software Engineer, developing our storage layer and relying on Apache Cassandra.  We are using Cassandra on our backend to store 200 million+ images.  These images are being continuously renewed every two months (we use the TTL feature to expire images).


We also store our users, messages (public and private), posts, and their connections.  User’s feeds, which contain social activity, is also stored in Cassandra.


Can you tell us how the decision came to choose Cassandra?

The important parts of Cassandra were the fact that it’s distributed and it is very easy to add capacity, we just have to add more nodes as far as it’s horizontal scalability.  We also like the high availability, because you can trade availability over consistency.  In fact, we have been using Cassandra since version 0.7.  At that time, I wasn’t the one making major decisions, but I came to the idea that Cassandra was best for our use. 


Great.  It sounds like a good use case for Cassandra.  You mentioned availability; are you running in more than one data center? 

We are running only on one physical data center.


As far as the environment, what does that look like?

We have a cluster of 8 Linux servers in a physical data center. Each server has 64GB of RAM and more than 2TB SATA disks. We have over 2TB of data stored in Cassandra.


Our storage layer is developed in Java and embeds Cassandra directly. As we embed Cassandra, when deploying a new version of our storage layer, we sequentially shut nodes down, hence we test our resilience to a node being unavailable.


One of the things we always want to do, is pass along any advice to other people who are either already using Cassandra or looking to get started.  What’s something you’ve learnt along the way that you think is worth passing along to other people?

We imbedded Cassandra directly in our storage area but now Cassandra is becoming a little more mature than it was a few years back.  If I needed to start a new project from scratch, I would use CQL3, which is easier to use and very fast.  That’s one advice I would give to people who would like to start using Cassandra.  


That’s really good advice.  My last question: It sounds like you ‘re involved in the Cassandra community locally there in Paris.  Could you talk about how active is the local community?

I presented at two meet-ups last year in Paris.  In regards to the virtual, there is also a lot of interesting discussion on the Apache Cassandra Users mailing list and a few very active people.  I’m also looking forward to going to Cassandra Summit EU next month.