November 19th, 2013

By 

 

 

Jakob Bak: CTO and Co-Founder at Adform

 

TL;DR: Adform is a technology platform used by thousands of media agencies, advertisers and publishers for planning, buying, executing and tracking online display advertising.

 

Adform needed to look beyond the traditional databases for their needs for ultra-fast key-value pair lookups on a massive scale.  They evaluated Cassandra against 4 other NoSQL alternatives and decided on Cassandra based on the good test results and the growing community.

 

The advertising platform company primarily uses Cassandra as the main underlying repository for their real-time-trading engine as well as their dynamic creative engine. Their real-time-trading cluster processes a few hundred thousand request per second the majority of which being auction based requests.

 

At Adform, Cassandra is currently deployed as a single cluster in production for more than 3 years, running on 32 physical blade servers and distributed over several datacentres.

 

What does Adform do?

Adform is a technology platform used by thousands of media agencies, advertisers and publishers for planning, buying, executing and tracking online display advertising. Our platform has a unique combination of rich media (richer advertising formats than ordinary banners such as video), real-time-trading as well as traditional ad-serving technologies. We are experiencing very high growth and have just reached 270 employees of which almost half are working with product development – with Apache Cassandra being a key component.

 

How are you using Apache Cassandra?

We primarily use Cassandra as the main underlying repository for our real-time-trading engine as well as our dynamic creative engine. Our real-time-trading cluster process a few hundred thousand request per second in which the majority are auction based request for which we use Cassandra as part of calculating bids on behalf of several thousand bidding strategies and campaigns. Our dynamic creative engine use much of the same data for determining which product or creative design is the best the display to each individual viewer.

 

What was the motivation for using Cassandra and what other technologies was it evaluated against?

It was obvious that we needed to look beyond the traditional databases for our needs for ultra-fast key-value pair lookups on a massive scale.  We evaluated Cassandra against something like 3-4 other NoSQL alternatives and decided on Cassandra based on the good test results, but also the high adoption it seemed to have while some alternatives looked more risky as a long term choice. Several years later now, the picture seems to be more or less the same. We still test alternatives regularly and have also implemented some for very specific purposes where they might perform best.

 

That’s excellent. Can you share some insight on what your deployment looks like?  

We have had Cassandra deployed as a single cluster in our production for more than 3 years. Currently its running on 32 physical blade servers distributed over several data-centres. Each server has six cores which we are in the process of upgrading, 128 GB RAM and 2x 300 GB 15k rpm HDD.  Our cluster is divided into 4 virtual DC’s with 8 nodes each having over 100 GB of data each.

 

What would you like to see out of Apache Cassandra in future versions?

Our data model is completely dynamic and CQL does not allow dynamic/wide rows, so we hope Cassandra will keep its before-CQL design forever without compromise on performance while adding new features. We had issues with counter columns and moved to simple expiring columns so it would be great if this will be improved in the future.

 

One of the latency spike factors is the Java Garbage Collector. If running e.g. every 3-5 sec we have seen performance issues giving 50-150ms response times even peaking up to a few seconds, which obviously can be problematic for our use cases.

 

What’s your experience with the Apache Cassandra community?

Our Cassandra experience is pretty mature, with over 3 years of production experience but we constantly learn new things from the community so we highly appreciate this.  We monitor Cassandra mailing lists, sometimes participating in discussions, sharing our experience. We participated in a London conference last year and really enjoyed the possibility to exchange knowledge with other companies.

 

Anything else that you’d like to add?

Cassandra is a great Big Data storage solution, highly available, easy scalable, with good functionality, especially expiring dynamic/wide columns, configurable to fulfil specific needs.

 

We’re particularly happy about how the Cassandra service runs without interruption and downtime with rare but fixable performance issues for more than 3 years with all minor/major version upgrades (0.7beta1 to 1.2.*) and hardware/OS migrations (3 types of machines, virtual and hardware, 3 OS’s).

 

We also have built and constantly improve our own Cassandra Thrift C# client to fulfil our specific needs.

LinkedIn