Cubie is a multimedia based messaging app, similar to What’s App, but with more support for media such as videos, YouTube, free phone calls and animated stickers. Cubie is popular primarily in South East Asia, but we have 11 million users worldwide. I am senior software developer at Cubie. Our developers are close to full stack, because we have to develop mobile apps, server side, cloud provision and deployment.
Generally speaking, the server side of Cubie is a messaging service. We didn’t rely on existing technology, but built the service from scratch. Of course, messaging communication between servers is distributed and the persistent storage that supports messaging also requires similar characteristics. To simplify operation, we store our domain model in Cassandra, too, so basically the entire service we offer relies heavily on Cassandra.
Before Cubie, we had already deployed Cassandra in a previous service, flash gaming. There, we transitioned to Cassandra because our old MySQL database could not handle the growing number of requests, so we had to find a better alternative. The alternative must scale out without pain, and it is better if it’s Java based. At the time we didn’t have many choices, Cassandra was new at version 0.6, and we elected to evaluate Cassandra first.
During the evaluation process we first migrated part of the source code to access Cassandra, then did a small load test. After the load test passed, we deployed Cassandra online and served with legacy MySQL at the same time. The result was quite good and we made the switch to Cassandra completely. The whole process took about one month.
We are now more familiar with Cassandra after using it with our gaming service. So when we began building Cubie, we chose Cassandra as the basis of our service. Cubie’s user base is much larger than the previous gaming service and after 3 years of real world operations, it has proven Cassandra was the right choice for us.
First, Cassandra is open source. This is helpful to a small startup in terms of keeping our costs down (we are a team of 11 people). Also, the NoSQL space is still immature when compared to traditional RDBMS, having the ability to diagnose software by ourselves is crucial. When problems occur, we can dig into the source code and report the bug, and even patch by ourselves.
Secondly, Cassandra is easy to manage. Every node is the same role. This is much better than other NoSQL solutions, as it cuts down our costs. Adding/removing nodes, system upgrades are all easy, we don’t have cluster down time caused by Cassandra. From a technical aspect, Cassandra is a Java based solution and we are a Java shop, so this is a perfect fit for us.
Lastly, Cassandra is backed by the Apache Software Foundation and DataStax; both are highly reliable organizations.
Cassandra is a distributed database, so it is a natural fit for Cubie’s system requirement. Basically, we offload every requirement in our service that needs to be scaled out to Cassandra. For example, one requirement of messaging services is discovering the presence of connected mobile phones (to find contacts on the app). We use Cassandra to store this information. There are several scheduled jobs that need to run in different hosts, so we store jobs in Casssandra, too. Of course, domain models such as ‘Account’ and ‘Friendships’ are all stored in Cassandra.
Our clusters are mainly hosted in EC2 Japan (AWS). We are running Cassandra 1.2 currently, deploying to EC2 Japan and Singapore. Application servers access Cassandra in Japan. Our Cassandra nodes in Singapore are occasionally used for real time backup and support analysis. The node size in Japan is 6, with a replication factor of 3, while Singapore is 2. In high growth periods, we scale Cassandra nodes up to 20.
Because we’ve spent 3 years developing Cubie, our code base are a mix of two drivers. one is Hector driver, the other is Datastax Java driver. Most of our column family are legacy compact store. Only new tables are using CQL. We plan to migrate legacy to CQL in the near future. With Cassandra we don’t have to worry about the pressure that comes with a growing user base. The scalability of Cassandra assures us we’ll be able to handle whatever the future holds.
I imagine most experienced users of Cassandra would say the same; for example, you should not deploy Cassandra on top of EC2 EBS drivers, or even touch a super column! But now there isn’t that problem anymore. I think the first question for a new Cassandra user is a mind set change regarding data modeling: data modeling should be based on application queries, not only based on persistent aspect in traditional RDBMS. Although Cassandra clusters are easier to manage, one should monitor column family usage. You may need to do compaction and repair regularly, depending on your service usage.
The mailing list is a really active forum, where we can learn from many real use case experiences, and get solutions. We’ve used it to ask a few questions and we always get helpful and positive replies.
To summarize why Cassandra is great,
Cassandra’s elastic scalability helps us a lot while we are growing rapidly. All we need to do is add more nodes to overcome the increased load. We can shrink cluster size after we fix and optimize our application.
It’s easy to manage. We are small startup and don’t have a dedicated operator team, this is an important factor for us.
CQL is great! After we made the transition to CQL, everything became much more simplified.
Our next project will be Cassandra based, too! Oh, and download Cubie for free!