Teads is an innovative ad server. Their goal is to create links between advertisers and publishers and to provide the technical part to allow broadcasting video ads in an “out-stream” way, so no more video content is needed to play a video ad. Alain is the main data scientist and architect, in charge of storing our tracking data; this tracking data is then used to provide real-time statistics which help us decide which ad is the best one to broadcast, millions of times, every day.
We use Cassandra in 3 distinct ways:
• We use a lot of counters to provide in real time statistics of the number of people exposed to any ad, or any website, and more.
• We store raw data to be able to grant (someday) more detailed statistics, crossing more dimensions, in a batch way, using Apache Hadoop.
• We store data to be able to give our algorithm the data it needs to chose the best ad to display following a specific set of rules.
We started using Cassandra 0.8.0 about 2 years ago. We upgraded to each major release and are now using Cassandra 1.2.11.
We liked Cassandra’s main characteristics:
• No single point of failure (We has some SLA, and any down time is really expensive)
• Horizontal scaling (Using AWS, this is very easy and efficient)
• Write efficiency (We track a lot, so our use case fits well.)
• Presence of counters
• Peer to peer clustering, with no master/slaves.
We had no time to benchmark at this time to help us choosing the right technology so we did it after reading a lot on the web, and we chose Cassandra over HBase, mainly because of our use case which implies a lot of writes.
We now have an AWS multi-region (US and EU) cluster with i2.2xlarge nodes using 1.2.18 and holding 300 GB data each.
We also have a replication factor set to 3 and make both reads and writes with a consistency level set to QUORUM.
For the operational part, which is a very important part while using Cassandra, I think it is mandatory understand a bit of Cassandra internals. You need to understand how things work under the hood to be efficient. Cassandra needs a good configuration, and this configuration highly depends on your use case. You can’t just do things as other people do, it won’t necessarily work well for you.
So take the time to understand how this beautiful tool works, or you will regret it later.
The Cassandra community might be one of my favorite things about Cassandra. The community is active, all the time, and ready to help through multiple channels (irc, mails, github …).
Numbers can sometimes be more explicit than words: according to my Grokbase Cassandra user profile, I sent 274 mails to ask or answer questions. I am on the top 10 users using the mailing list. I almost all the time had answer to my questions and helped a lot of people.
Well, as you may have understood, the community is in the center of my Cassandra usage, and I think it should be this way for any user.