December 19th, 2013

“As for the NoSQL competitors, based on our evaluation we found that Cassandra met our needs, in particular those related to scalability and availability.”

-Marko Asplund, Senior Software Architect at Ixonos

Marko Asplund Senior Software Architect at Ixonos



What does Ixonos do and what is your role there?

Ixonos delivers custom software and digital services to clients. We work through the whole software delivery lifecycle from service and user experience design through implementation. Ixonos also provides cloud computing services from our own data center.


I work as a software architect and developer in customer delivery projects. In a typical project I take part in elaborating requirements and do architecture design as well as development work.


How are you using Apache Cassandra?

We’re using Cassandra in a few projects, in different contexts. They’re mostly greenfield projects, because you can better achieve Cassandra’s full potential if the application is designed for the eventual consistency model from ground up. In one project we use it as a data storage solution for a RESTful user profile service. The data model is open-ended. Transaction volumes are a bigger challenge than the data volumes are. Availability is also high on the requirements list. We use Apache Solr for indexing and searching the profiles, so that we’re able to do ad-hoc and more complex queries. In another project we’re using Cassandra for storing and retrieving user notifications. We’re also actively exploring possibilities of leveraging Cassandra in new projects.


What was the motivation for using Cassandra and what other technologies was it evaluated against?

Circumstances are different per project, so we’re comparing Cassandra against a variety of technologies on a per project basis. Typically it’s evaluated against the most popular relational and NoSQL databases, both open source and proprietary.


For us, the biggest reasons for looking outside the RDBMS world, were high-availability and scalability requirements. With relational databases the object relational impedance mismatch is a pretty well understood challenge, so we didn’t see that as a major issue. Load-balance and high-availability clustering can be achieved on the deployment architecture level, to a certain extent, but at some point you need to start relaxing the consistency requirements in order to scale-out, often involves pervasive changes to the application. Good support for open-ended data models was also an important consideration for us. You can model open-ended data with the relational model, but actual RDBMS implementations are not really good at handling that sort of data models with high data and transaction volumes.


As for the NoSQL competitors, based on our evaluation we found that Cassandra met our needs, in particular those related to scalability and availability. As an open-source project Cassandra is mature and reliable, but also being actively developed by a large community. Cassandra is also widely used and proven in many large-scale deployments. There’s an active developer and user community and the documentation is pretty good.


Can you share some insight on what your deployment looks like?

We’re running Cassandra on our cloud platform. We started off with a pretty simple deployment architecture with Red Hat Enterprise Linux virtual servers and SAN-based storage (on spinning disks). Our plan was to run performance tests on the environment, analyze performance bottlenecks and revise deployment architecture to eliminate bottlenecks. We were expecting to see IO bottlenecks guiding us towards local, possibly SSD-based storage. After running performance tests we noticed that the system was able to handle all the projected load, so now we’re monitoring the performance statistics and plan to increase capacity when that becomes relevant.


What would you like to see out of Apache Cassandra in future versions?

Better support for ad-hoc queries in some form is something we’d like to have in Cassandra, perhaps through Apache Solr integration. There’s quite a lot of documentation about Cassandra, but it doesn’t always keep pace with the fast development cycle. For new users I think it’s important to start learning about the underlying concepts (also below CQL) and data modeling. Cassandra can be used to solve a particular problem in several ways, so documenting the best practices is important to avoid working against the system.


What’s your experience with the Apache Cassandra community?

We’ve participated and followed the discussions on the users mailing list and found them to be a very useful and lively forum for sharing experiences as well as seeking support.