December 17th, 2013




Paolo Crosato: Software Engineerat UbiEst


Today’s Apache Cassandra use case features Paolo with UbiEst. Paolo, thanks for joining us, what does UbiEst do?

UbiEst is the leading Italian provider of location based services. Our mission is to give our customers peace of mind about people or valuables they care of. We develop fleet management solutions for companies, which need to track their vehicles for security or efficiency. We develop personal security solutions, for people who need to take care of their dear ones, pets, or their valuables they care for. We provide a wide array of web mapping solutions, like navigation, store locators and geomarketing.

How are you using Apache Cassandra?

We use Cassandra as a data storage solution for time series data sent by GPS trackers. Allowing us to query the data and give real time reports to our customer. We also run aggregate data analysis and data mining operations daily, and we need to do that with a minimal performance hit on the whole architecture. Deploying Cassandra was very easy compared to other storage technologies, and it blends nicely with our architecture, which still relies on Oracle or PostgreSQL for data that needs strong relationships and normalization.

What was the motivation for using Cassandra? Did you evaluate against any other options?

We need a fast read/write solution for our GPS trackers data, as fast as possible, and Cassandra proved to be the best solution for our need. We evaluated both relational based solutions like Oracle and PostgreSQL, and document oriented databases like MongoDB. Oracle was discarded, because of steep license cost increases and sub optimal performances. MongoDB requested too many hardware resources to provide the performances we needed. However, the main reason we chose Cassandra over other technologies was that it perfectly fits our use case, it felt like the best tool for the job.

That’s great. What does your deployment look like?

We have a single cluster configuration with a very small number of nodes. Everything is hosted in the cloud. The current amount of data is about 200GB, we will reach one billion rows on our main column family by the end of the year and continue growing.

Is there anything you’d like to see in future versions of Apache Cassandra?

We’d like to see better integration with Java and Spring, since we develop a lot of services with it. It would be nice to have more documentation, a complete guide on data modeling with use cases. Up to date documentations and guides would help too, we found many online resources and some books, but most of them referred to obsolete versions of Cassandra. We also found a lack of tools for importing and exporting huge numbers of data, a command line utility like imp/exp or mongoimport/mongoexport would help a lot.

What’s your experience with the Apache Cassandra community?

We had a great experience with Datastax, they were extremely supportive and helped us a lot, both with data design and cluster deployment. The user mailing lists, expecially the Java Driver one, provided great information and insights, so our experience has been very positive.

Anything else that you’d like to add?

We like Cassandra a lot because it provides what it advertises, i.e. stellar performances with reasonable hardware resources, even if the data model imposes strong constraints over the queries, and the query language is not as rich as in other noSQL solutions like MongoDB. Please keep on improving Cassandra, without mimicking relational or documental features that would badly impact on the performances.