December 2nd, 2013

By 

This blog posting was syndicated from the Zoosk Dev Blog. View the original posting here.

If you are a Zoosk user, then you have most probably experienced persistent notifications as a badge on the blue header bar. Clicking that badge shows your notifications which comes in three flavors; personal, social, and account maintenance oriented.

 

Persistent notifications are a great use case for Cassandra which really shines when storing time series data in wide rows. Our use of Cassandra is almost text book Service Oriented Architecture. We use a single column family with a long row key type, a composite comparator of time and string, and level compaction. There is the capacity for explicit deleting but most notifications eventually scroll off using the TTL (Time To Live) functionality of Cassandra.

 

We wrap all access to Cassandra through a web service cluster that also scribe logs activity for Business Intelligence purposes and publishes operational metrics to Ganglia and alerts Nagios on system status. This service uses Hector to access Cassandra. We use the classic slice query for reading and a mutator for writing (batch mutates for social notifications). Consistency levels, custom serializers, and load balancing policy are important things to handle when writing this kind of service.

 

Even though Zoosk is a web scale property with millions of Daily Active Uniques, our 5 node Cassandra cluster can easily keep up. We do use db class servers with lots of RAM and SSDs. We use a single data center and single rack topology. Even though there are four times more writes than reads, write latency is two orders of magnitude faster than read latency. Our service cluster runs on three web class machines.

 

Here are five lessons learned from this move.

 

Don’t cheap out on hardware. A common mistake is to use machines with lots of hard disk space but minimal CPU and RAM. That is not a good type of machine for Cassandra which works best with lots of CPU cores and lots of RAM. Also, consider using SSDs which makes compaction much easier.

 

Resist the urge to twist knobs without understanding them. Cassandra comes with lots of configuration parameters for easy performance related customization. The default values are very good. If you are considering changing a significant number of those values, then maybe Cassandra isn’t a good fit for your use case.

 

Where should each feature live? Modern application development is very layered. Consider carefully which layer, or layers, that any particular feature should be implemented. Here at Zoosk, various clients (web, touch, mobile, desktop) call an external facing API. That API calls internal services which, in turn, interact with various data storage and message queuing technologies. The external APIs are concerned with the optimal chunking of information for the clients. The internal APIs are concerned with optimal performance and scalability with the underlying data stores. Monolithic architectures are a big no-no.

 

Consider using dynamic consistency levels for graceful degradation. If consistency is important, then you are most likely wanting to use QUORUM reads. Your internal services should be prepared to dynamically switch to ONE consistency level for reads when read latency is getting bad.

 

Definitely schedule regular, periodic, staggered nodetool repair jobs on each node in your cluster.