Philippe Girolami CTO and Co-founder at Sensorly
I’m the CTO and Co-Founder of Sensorly.
On the consumer side, Sensorly is a free service that allows people to see the coverage and performance of mobile and wifi networks. The data is crowdsourced through our Android and iOS apps. For example, people use it all over the world to see how the new 4G networks fare or to check if it’s worth switching carriers. As of today, we have maps for over 400 networks.
On the enterprise side, we help wireless network professionals and mobile carriers develop and improve their activity via dedicated products and services. We show them what’s happening in their network as seen by devices which is totally new for them.
Improved service with Cassandra
We use Cassandra 1.2 : phones report coverage and performance measurements and we update our maps to display that information. Thanks to Cassandra, performance measurements are visible the second they are reported and our coverage maps are updated within the day (usually the hour at the very most). The challenge for us is to keep up with the ever-increasing and huge amount of data phones send us.
Cassandra was a huge enabler for us, it would have been a lot harder to provide our level of service without it.
It was evaluated against the usual suspects (at the time) : MongoDB, HBase, Tokyo Cabinet, Project Voldemort, etc.
What made us choose Cassandra was 1) ease-of-use and 2) counters support
I’m glad to say it feels like we made the right choice : we’ve never had to break a sweat handling a server crash at night and we’ve already swapped out servers twice to increase their capacity and seen the cluster handle that gracefully. The only time we ever ran into a blocking issue was trying to get the cluster to work in a weird hybrid VLAN environment but even that was solved.
We currently run our cluster in one DC in Europe, on dedicated hardware because we found it most cost-effective for us. Unlike most Cassandra deployments, we’ve chosen to beef up our nodes to make our deployment as lean as possible to keep our ops low. Our servers have 128GB of RAM and switching to Cassandra 1.2 enabled us to benefit from the FS cache big time.
We store close to half a Terabyte of compressed information; and that’s after spending a considerable amount of time optimizing how we store the data.
My first word of advice would be ‘make sure it’s the right solution for you’. I think too many people are jumping on the NoSQL bandwagon because they read an article. We went NoSQL only after hitting a brick wall with more typical technologies.
My second word of advice would be ‘take your time’ : we iterated three times before deploying to production. On our first attempt at handling the load, we saturated the IOs on a three server node running SSD disks in RAID-0 (yes that’s 600MB/s of IO on each server). Some engineering and weeks later, we were down at 30 to 40MB/s. Our latest nodes have enough memory to cache most of the hot data now and we’re down to 0-20 MB/s of IO on each server.
Community mailing list
We’ve interacted with the community mailing-list and found it to be supportive. I do believe you must have some tinkerers in-house to benefit from this kind of support.
And we’re hiring! We’re looking for a Platform Engineer in Paris to build & scale our data pipelines. Chances are we’ll be using Cassandra and Spark!