At BUX, we are building a mobile only stock market trading app that aims to make it easy and fun for (non-experienced) users to start playing the markets. Users start off with our virtual currency funBUX and can build a portfolio in selected stocks, indices, commodities and currency from the U.S. and Europe. If a user wants to, he or she can upgrade their account to real money (seriousBUX).
All products are traded in real time, with actual market data. For the seriousBUX part of the product we integrate with brokers that provide the actual execution of the trades on the market. On top of the trading activity we have built social features such as an activity feed, News and Battles with friends (i.e. who is the best investor in a certain timeframe)
Currently we are live with an iOS app in The Netherlands and the U.K. Android app is set to be launched in the beginning of 2015 and we are also going to expand to other European countries.
I am the CTO and a Co-Founder of the company and I currently have 3 mobile and 2 backend developers in my team.
I started working with Cassandra in 2010 (version 0.6) at my previous company. Back then we wanted to have a JVM based solution because we knew how to operate Java at scale (we had 500 servers at that time, most of them running Java). We also looked at HBase (as we already has a Hadoop cluster running at that time) but were put off by the amount of moving parts. What made the decision for Cassandra was the fact that it was a single process that was easy to cluster over multiple machines.
At BUX we knew we needed to store a lot of state updates / overwrites in a short amount of time. As the Cassandra writes is on the direct execution path it has to be really fast. Also Cassandra is very easy to operate and is really stable.
We use Cassandra as part of the ElasticActors framework, which is a Persistent Stateful Actor Framework written in Java of which I am the author. In this message passing framework each actor has it’s own state object that is serialized (using Jackson and lz4 compression) into a byte array which is subsequently stored in Cassandra.
The framework uses sharding to scale up. This means that an ElasticActors cluster needs to be initialized with a number of shards (say 1024) and Actors are mapped to a shard, while shards are mapped to nodes using consistent hashing. Within Cassandra each shards has it’s own row, and each actor is mapped to a column.
Because Cassandra is so fast in writing, we can afford to serialize the whole state every time a message is received. Currently we are already doing thousands of writes per second at peak and we are growing twenty to thirty percent every week. I expect this setup to easily scale to fifty to a hundred thousand writes per second.
The interesting thing here is that we do a lot of overwrites. Cassandra’s architecture with the Memtables and SSTables is very suitable for this as the Memtable will absorb a lot of the overwrites and also after a scale out the state of recently active actors is still in the memtable which leads to very fast reads as well. One caveat here is that the HeapAllocator needs to be used (memtable_allocator: HeapAllocator in cassandra.yaml).
We are running Cassandra version 1.2.13 on Debian Linux with Oracle JRE 1.7.0_60. We have 3 nodes in a single datacenter and have a replication factor of 3 and we read and write with Quorum/Quorum.
The hardware is 2x480GB SSD mirrored, on a 12 core machine with 48GB RAM. Tip: when on this hardware, don’t forget to set the scheduler to noop! (http://stackoverflow.com/questions/1009577/selecting-a-linux-i-o-scheduler)
There are have 2 ColumnFamilies, one that stores all the actor state and another one that stores the scheduled messages. I know Cassandra should not be used as a persistent queue, however I didn’t want to add yet another moving part to the setup. Since there are a lot of deletions I have set gc_grace_seconds to 3600 so tombstones are collected every hour.
Cassandra gives us a very reliable persistency layer for our application. We store all the current state of our app in there and when we restart or when there’s a scale out or failover situation we rely on Cassandra to quickly serve up the state to our application clusters.
The community is always very good help, I mainly needed support in 2010 and 2011 when we were scaling up. Together with my colleagues at eBuddy (my previous company) we even submitted some bugfixes to the the Cassandra code. I also lobbied to get the HeapCollector back in after it was taken out in a certain 1.2 version (if I recall correctly). Support from the core committers was always great.
From a development perspective, try setting up a cluster on your development machine (for instance using vagrant and virtualbox) to experience how easy it is. As for data model design, try to stay away from the relational model and store your data as you want to use it.
With CQL I think there is a risk that developers start viewing Cassandra as a relational database and start making the wrong decisions in their data model design. Having said that, I really like Cassandra and I advise everybody dealing with Big Data to take a look at it. Also, it can do way more than the use case that I am using it for now, so check it out!