TL;DR: Libon is the global communication suite of the Orange group, a European major telco company. Libon offers whatever feature you expect to find on a comprehensive communication app (chat, file exchange, location sharing, etc.) and is available on Android, iPhone and desktop.
Libon handles massive data from user devices, providing a complete backup of user data across devices so the conversation you start with on your iPhone can be resumed on the HTML5 page at home. Libon was originally storing this data in Oracle, but began having trouble scaling.
Looking to alternatives, Libon considered refactoring their Oracle model and benchmarked Cassandra against MongoDB. Oracle fell by the wayside, as Libon felt that if they wanted to scale with Oracle, it would quickly become costly. In the end, Cassandra was the “perfect fit” because of its ability to scale easily, the operational simplicity, and operational requirements.
What does Libon do and what is your role there?
Libon is the global communication suite of the Orange group, one of European major telco companies. Libon offers whatever feature you expect to find on a comprehensive communication app: chat, files exchange, location sharing, personal voicemail, text to speech and speech to text transcription, to name a few. The application is available on Android, Iphone and desktop with an HTML5 interface.
My role on Libon is to work with the back-end API team, especially on Cassandra development. I help the team designing effective data model to improve performance in production but also leverage some powerful Cassandra features like TTL to its full extent. I also spent some time with the Operationals to fine-tune the server and investigate on issues.
How are you using Apache Cassandra?
We are currently on Cassandra 1.2.6 with Thrift using Hector. Migrating to 2.0 with CQL3 is in the pipe for 2014. We use Cassandra where it is the strongest, e.g. to handle massive data from user devices. Libon provides a complete backup of user data across devices so the conversation you start with a buddy on your Iphone can be resumed on the HTML5 page at home. This can lead to a big volume of data to store. Of course we take data privacy seriously and let end users control their data retention.
Most of the design are around timelines so indeed it is the perfect fit. Every event created by the devices (smartphone or HTML5) are uploaded and ingested into Cassandra in different tables, each dedicated to a particular usage (latest events, unread events ….).
One of the coolest feature we use a lot in the project is the TTL. It provides a natural thresholding mechanism to set up a simple anti-fraud system (maximum number of sms sent per time-window) and also a natural auto-cleaning process for short-term/disposable data. We love it so much than we’re broadening Cassandra usage on new projects.
We also plan moving user contacts data from Oracle to Cassandra for performance boost. This is the next big step with data migration from Thrift to CQL3.
To help large adoption of Cassandra in the team and gain on productivity, I developed Achilles and put it on the project. It offers a higher layer of abstraction over the Java driver core with complete object mapping and persistence. On, overall, it simplifies our teams daily life.
What was the motivation for using Cassandra and what other technologies was it evaluated against?
The data was stored originally in Oracle in such a way that it does not scale at all. Before my coming on the project, Cassandra was benchmarked against MongoDB. We chose Cassandra because of its ability to scale easily, the operational simplicity (one single process to monitor!) but also because it met our requirements (massive data displayed as timeline).
Refactoring the Oracle data model was also an option but we felt that if we want to scale with Oracle, we’ll need to invest lot of cash for dedicated hardware and licences, even before talking about development and fine-tuning efforts, probably making it not worth it.
To be honest, to scale massively with Cassandra, you will also need fine-tuning but the overall investment is not on the same level as with Oracle.
Can you share some insight on what your deployment looks like?
We went to production with Cassandra a few months ago with a cluster of 5 nodes and just one data center, 100Gb of data (start small). The data is increasing very fast recently and we plan to add more nodes in the short term to cope with the new traffic but also for analytics requirements that are coming.
What advice do you have for those just getting started with Cassandra?
My first advice is always the same on Cassandra, think about USAGE first, before anything else The usage pattern (how you ingest data, how you search for data…) should drive the development. For me it is is even more important than data modeling, which only comes second.
My second advice, gaining from real experience, is to denormalize as soon as possible. Late denormalization is always painful. Although the denormalization mental process is not very natural for people coming from the SQL world, it is mandatory somehow with Cassandra and you’re going nowhere without it.
The last thing I want to emphasize is data modeling. There is a lot of interesting ideas to play with Cassandra data structures, new design patterns are discovered every day by people. Skill-wise, there are plenty of good materials online (blogs, Cassandra webinars, slideshare, mailing list…) to help you understanding the basics of data modeling.
What’s your experience with the Apache Cassandra community?
The community is great and I got a lot of help from it. I especially want to thank Sylvain Lebresne and Michaël Figuière for answering my technical questions to demystify Cassandra. So thank you guys!
I also try to give back to the community with Achilles. There was a lighting talk about Achilles at Cassandra Summit EU in London lastly to introduce it. If the framework could help people to be more productive, it’d be great.
Anything else that you’d like to add?
Well, I’ve said a lot, so last but not least, I’m looking forward for new great features coming with Cassandra, especially the announcement of custom types in CQL3! There is a potential for powerful data modeling.