Execom is a software engineering company focused on the complete software engineering cycle – from idea, design and production to maintenance of complex software solutions. We primarily work with clients from Western Europe – mainly Netherlands, Germany and UK. While working as an outsourcing company, we also have several in-house projects.
My role at Execom is Senior Software Engineer working in dynamic teams on wide variety of projects in different business domains, programming languages and technologies. Recently I’ve become a part of big data team focusing on data processing and analysis.
We had another team that developed an application with our client that ran on MongoDB. They had a similar business case with storing time sensitive data and processing it. For the first proof of concept we used MongoDB because of the in-house knowledge, then when we got to the point of load testing with projected scenarios it was apparent that we needed more horsepower. Since I had previous experience with Apache Cassandra, after short investigation the decision was made that we make the move to Cassandra.
Our main reason for switching to Cassandra was that we needed a solution that has performance and scalability. On our initial project we had load tests that required executing billions of inserts, and that was not possible with the current setup at the time.
At this point we have two active projects running Cassandra.
Our first one is storing data collected from in-house developed sensor devices which measure all kinds of different parameters. The data model is mainly time series data since all measurements are time related. Data is collected using MQTT protocol and then processed and aggregated in real-time. All data is available through web interface or platform specific application for desktop and mobile platforms.
The second application is more complex regarding data processing and querying. Our main goal is to provide real-time information of store and stock items with all manual and automatic counting information processing. Our client has both hardware and software products related to different parts of the item life cycle.
Since Cassandra is in rapid development we try to use the latest stable versions of both the database and DataStax driver. Everything runs on virtual machines in cloud and this makes it easy to add instances when needed. At this point we have a multi-data center, multi-node deployment. We try to keep our data models as simple as possible; there is no need to complicate things.
Using Cassandra we have received benefits in both performance and scalability. This is something that is not just a marketing ad with small letters in the bottom. It really performs; I’m dead serious. Partition tolerance and availability is as “marketed”: 99.9%.
The only thing that affects performance are developers, bad data models, unnecessary complicated queries, or any other bad practices that introduce performance impact. We try to keep it simple.
We had a few problems last year for which I couldn’t find an answer. This led me to the IRC chat with DataStax people working on the driver. I asked some hard questions, but we got to the bottom of it.
Regarding the documentation and blogs, there is a lot of information about everything you need. A huge number of conference presentations and slides also help a lot. I’m really happy where it’s all going and how everyone is trying to contribute. That made me attend conferences and talk about our experience and technical side of it from a developers perspective.
If someone is coming from the SQL world like me, try to forget everything you know related to relational databases.
I think that the main problem lies in understanding the theoretical side of it, A.K.A. the CAP theorem. Read Google’s Big Table and Amazon’s Dynamo paper and try to understand how Cassandra works under the hood. This is one of the crucial steps in being successful.
I’m really excited about Cassandra and everything related to Big Data. This is a great time to work with Cassandra and it only looks to be getting better. Cassandra is changing from an “everybody is talking about” technology to an “everybody is using it” technology.