LineMetrics is an IoT platform for the B2B market which facilitates continuous improvement processes based on real world sensor data. We have a great variety of customers in many different fields: small, medium, and big sized companies covering metal manufacturing, plastics manufacturing, energy consultants, power supply companies, retail chains like banks, retailers, hotels, etc. LineMetrics integrates seamlessly into the continuous improvement process: creating transparency, defining objectives/measures and controlling achievements.
It’s as simple as it can get: an easy to use tool for optimization processes based on sensor data. I am the lead architect of the system architecture and the software stack of the LineMetrics product.
Cassandra is used to store raw sensor data and aggregated data with a time reference. For example we are collecting and storing sensor data like energy consumption, temperature, CO2 levels, production output, stoppages of machines, etc. We take this data and transform it into different time series data (e.g. sums per month, averages, peaks, etc.). All this data is held within Cassandra.
Our time series data was stored in a MySQL database before. But the schema restrictions and data size has forced us to move to another storage system quickly. To be honest there was not much debate about other alternatives. Cassandra simply does the job we need to be done really really well.
We required continuous real-time analytics based on sensor data means working with a huge amounts of data. Within our first 12 months our data storage passed 1 billion data points. Right when we started to design our product we knew we needed a database perfectly suited for our needs. Our data mostly is structurally simple time series data.
We quickly discovered that relational databases did not meet the performance we set as goal for our application. That was somewhat expected but our technical background was mostly in relational databases. Naturally we looked for alternatives and Cassandra was the obvious choice for us. If I remember correctly we read about Facebook using Cassandra for a specific part of their data at the time. Cassandra perfectly handles our data in the way we need it.
We are impressed by the scalability so far. It is intended by design but it just amazes us time and time again to see no matter how much data is in our data store there literally is no noticeable difference in performance. Optimization for quick data access is just superb with Cassandra. Our query time constantly is below 10ms. No matter what we do.
At the moment we are running two nodes. Data is stored on two geographically separated locations. Our test environments are mirroring the data structure from our live production system but are filled with automatically generated dummy data.
Most important step in the beginning is to model the data structure to fit your future queries. That is the key for constantly quick queries. In our case we designed an index that basically only has two levels: which datastream at which aggregation level. The second level specifies the time range (date from/to).
From what we have seen there is a thriving community all around the world. We are looking forward to join one or another live event in the future. Due to the Runtastic founders being business angels at LineMetrics our knowledge exchange mostly happened with the awesome Runtastic team. They have done great things with Cassandra and we are happy to learn from them as much as we can.
We are just grateful for what Cassandra and the people behind it are giving us. We will make sure to one day return the favor.