Meeting the Data Demands of IoT: Selecting the Right Database and Supporting Always-On Operation
Powerful IoT applications – and especially those leveraging AI and machine learning – require equally powerful data infrastructures to reliably store and process the massive data volumes intrinsic to always-on operation. Our business is well-acquainted with this challenge.
Bigmate recently built an IoT platform that ingests and processes heterogenous data at rather tremendous scale – we’re routinely displaying 20,000 real-time data points to a single customer. Over the course of development, our technical team considered a range of databases to back the IoT platform, including MySQL, MongoDB, and others. As a result of this vetting process, though, it was open source Apache Cassandra that emerged as the most advantageous choice for our IoT and AI/ML data solution needs.
In planning our SaaS-delivered IoT platform, we developed a set of non-negotiable data-layer criteria to ensure that we could offer broad utility and compatibility across a wide range of use cases. These conditions included:
- a device-agnostic approach that could allow any device to share data and benefit from data insights;
- the ability to turn raw data into business insights;
- compatibility with any communication network;
- thorough data security;
- data tools that would be able to scale vertically and horizontally;
- versatile access availability through the web or APIs.
The goal of the project was producing an IoT solution ready to seamlessly integrate with customers’ various IoT sensors, devices, and third-party platforms while also fulfilling all the interoperability, capacity, and capability requirements fundamental to production IoT deployments.
Our internally-developed IoT and AI/ML applications helped calibrate the demanding data needs. For example, our Thermy soluiton utilizes a thermal CCTV camera to capture skin temperature readings from individuals as they pass through the camera’s viewpoint. It leverages artificial intelligence and machine learning to identify temperature abnormalities in real-time and prescreen individuals with potential illnesses. The technology was developed before the onset of the COVID-19 pandemic, but applies directly to the current need to rapidly and accurately flag at-risk individuals for further testing.
In another example, our in-house solution Warny similarly harnesses CCTV along with IoT and AI/ML technologies to detect potential collisions among worksite vehicles and workers, and send preventative safety alerts. These solutions share common data-layer challenges: they require a database capable of rapidly ingesting massive amounts of data, a need for processing with real-time responsiveness to power timely alerts, and scalability to deploy across wide-ranging and global locations.
Apache Cassandra: The Ideal Choice for Scalability and Performance
In vetting MySQL, MongoDB, and other potential databases for IoT scale, we found they couldn’t match the scalability we could get with open source Apache Cassandra. Cassandra’s built-for-scale architecture enables us to handle millions of operations or concurrent users each second with ease – making it ideal for IoT deployments. Cassandra also provides simple and limitless scalability through the ability to seamlessly add nodes to existing clusters while they remain active. Leveraging this scalability puts us in good company. Some of the world’s largest production deployments, including at Apple and Netflix, utilize Cassandra environments that include thousands of nodes. At the same time, Cassandra’s architecture delivers performance that surpasses that of most alternative NoSQL database options, capably delivering the real-time processing and responses required for any IoT application to function.
Given the crucial oversight and operational controls that most IoT solutions provide to their organizations – especially when it comes to safety-oriented solutions like our own Thermy and Warny – availability and uptime were also viewed as critical factors in our database selection criteria. Here again, Cassandra was up to the task: the database achieves fault-tolerance through automatic data replication across multiple nodes, therefore delivering continuous availability and uptime with no single point of failure.
Managed Service Benefits: Outsourcing for Success and Optimization
The database decision-making didn’t end with selecting Cassandra. We then needed to figure out whether to implement our Cassandra deployment in-house or to enlist a managed Cassandra service provider. We conducted a total cost of ownership (TCO) analysis, concluding that while it would be better for us financially to insource Cassandra management at the beginning, we would soon scale to the point where outsourcing Cassandra management and support would provide significant cost savings and ensure optimization. Cassandra may be simple to scale, but doing so reliably requires deep knowledge of Cassandra’s functionality, break points, and efficiencies. We enlisted Instaclustr for its globally-recognized Cassandra expertise, as well as differentiation as a provider committed to delivering Cassandra in its pure 100% open source form – that would ensure the code would always be ours and we’d be free of vendor lock-in required of proprietary / open core Cassandra providers. For many of the same reasons, we also fully built our IoT and AI/ML platform and solutions on AWS, ensuring our ability to reliably scale to serve enterprises worldwide.
Ultimately, Cassandra’s combination of performance, flexibility, reliability, and its scalability for achieving global reach makes it a clear choice for IoT and AI/ML use cases, enough so that we have full confidence entrusting our business to it.