January 27th, 2014

By 

Glenn Engstrand: Senior Software Engineer at Zoosk

TL;DR: Zoosk is a leading online dating company that learns as you click in order to pair you with singles you’re likely to be mutually attracted to. With the #1 grossing online dating app in the Apple App Store, Zoosk is a market leader in mobile dating. Available in over 80 countries and translated into 25 languages, Zoosk is a truly global online dating platform.

Zoosk was previously using MySQL for their persistent notifications system and was in search for a database that better fit their high writes of time-series data use case.  Initially the Zoosk team evaluated Riak, but chose Cassandra because they felt that column oriented wide rows provided a better fit for their needs over simple key/value, document oriented data storage.

Zoosk now deploys a five node Cassandra cluster for their personal, social, and account notifications where Cassandra “really shines”.

What does Zoosk do and what is your role there?

Zoosk is a leading online dating company that learns as you click in order to pair you with singles you’re likely to be mutually attracted to. Zoosk’s Behavioral Matchmaking™ technology is constantly learning from the actions of over 25 million searchable members in order to deliver better matches in real time. With the #1 grossing online dating app in the Apple App Store, Zoosk is a market leader in mobile dating. Available in over 80 countries and translated into 25 languages, Zoosk is a truly global online dating platform.

I serve as senior software engineer on the platform team.

How are you using Apache Cassandra at Zoosk?

If you are a Zoosk user, then you have most probably experienced persistent notifications as a badge on the blue header bar. Clicking that badge shows your notifications which comes in three flavors; personal, social, and account maintenance oriented.

Persistent notifications are a great use case for Cassandra which really shines when storing time series data in wide rows. Our use of Cassandra is almost text book Service Oriented Architecture. We use a single column family with a long row key type, a composite comparator of time and string, and level compaction. There is the capacity for explicit deleting but most notifications eventually scroll off using the TTL (Time To Live) functionality of Cassandra.

We use one data center, and are running a 5 node cluster with Apache Cassandra 1.1.6. To learn more check out our blog post here,

https://about.zoosk.com/en/engineering-blog/moving-persistent-notifications-from-mysql-to-cassandra/

What was the motivation for using Cassandra and what other technologies was it evaluated against?

The use case (more writes than reads on time series data) compelled us to move persistent notifications from MySQL to Cassandra. We did evaluate Riak at the same time but chose Cassandra because we felt that column oriented wide rows provided a better fit for our needs over simple key/value, document oriented data storage.

What advice do you have for those just getting started with Cassandra?

Read the Cassandra Data Modeling Best Practices by eBay’s Edward Capriolo.

What’s your experience with the Apache Cassandra community?

Ben Coverston gave us our training. He was great. I loved your 2013 conference.

Anything else that you’d like to add?

The downside to hector is that it uses Cassandra’s thrift interface and you cannot mix different versions of thrift. So, you can’t write services that scribe log hector operations.Feel free to use CQL but be advised that it is not SQL. Though they both look similar, what you may already know about SQL won’t be applicable to CQL.