February 4th, 2015

i2O Water Use Case

i2O Water design, build and supply intelligent devices to the water industry that work in a symbiotic manner with our securely hosted, multi-tenanted Software as a Service (SaaS) platform. This holistic system is used to intelligently monitor and control the pressures within a water distribution network with the objective of reducing the pressures to the minimum levels possible at all times, whilst still delivering the quality of service required by the water consumers in the network. The benefits of this solution is that there are fewer bursts in the network and there is a dramatic reduction in leakage of water also.

To date, i2O has deployed over 5000 devices to over 70 water utilities in 25 countries across the globe. As a result of using our intelligent technologies, these utilities combined save over 235 million litres of water per day that would ordinarily be wasted. To put this in real-world terms, that volume of water is just over 100 Olympic Swimming Pools.

Our Technical Solution

An i2O system consists of the following high level components:

  1. A battery powered intelligent controller device which sits at the inlet to a subnet of the network and is used to control the pressure entering the subnet.

  2. An advanced pilot valve device which can provide much finer grain control over the pressure at the inlet.

  3. A battery powered intelligent sensor device which sits at the “Critical Point” in the subnet (usually where the pressure would be lowest or most critical in terms of service delivery).

  4. An Event Driven Architectural SaaS platform which is responsible for the machine learning algorithms that optimise the pressures and deliver the intelligent control  models to the controllers.

The devices are themselves built around the concept of intelligent extension of their capabilities, both in terms of the modular nature of the electronic and mechanical hardware but also the nature of the firmware design. This allows i2O to extend and expand the devices capabilities to offer even more smart solutions to our customers. These solutions are able to be developed and tested by our agile research and development teams and when proven, be deployed over the air from the SaaS platform to the devices which can then make use of the new firmware modules and services.

Examples of the devices are shown below.

Critical Point Sensor


Smart Controller / Inlet Sensor

Advanced Pilot Valve

Screen Shot 2015-01-27 at 12.09.21.png

The devices gather data in a low power mode on a number of physical channels which include but are not limited to;

  • Pressures

  • Flows

  • Temperatures

  • Battery Voltages

  • GSM Signal Strength

These data channels are then communicated to our SaaS platform over the mobile internet via GSM/GPRS at a time configured by the customer. In a typical installation this communication happens only once per day. Due to poor signal strength in certain geographical areas of the world, this communication may not succeed on certain days and this is a key driver to understand how we have designed our Cassandra data models as I will explain later on. The physical data channels can be configured by the customer to acquire and sample data at a frequency relevant to their situation and also can be configured to average and gather more enhanced statistics on the channels which is also transmitted to the platform for further analysis, learning and display.

The configuration of the devices is managed by the user through the web interface onto our SaaS platform and importantly to note, the configuration is associated with the location of the device, not the actual device itself. This allows the customers to hot-swap devices on site and the newly swapped device will inherit the configuration from that location vi the platform, and behave identically to the swapped out device. The customer considers their network as something that can and does evolve over time and this fact together with the device swapping leads to an interesting use case in our data modelling.

Cassandra Data Models to Support the Use Case

The use case that is most interesting to consider that is relevant to the IoT and time-dependent data in Cassandra for i2O is that of the channel data that is sent from our devices to the SaaS platform. This use case has to consider the following challenges:

  • Data can arrive at any time for any number of channels and devices, especially relevant is the case that data can arrive much later that when it was recorded (usually due to communication issues in the GSM/GPRS network)

  • The location where the device is currently deployed may not be the location where the device was when the data it sends was collected

  • The customer’s network may have changed topology since the data was recorded on the device

  • Data is recorded on physical channels but customers expect data to be presented on logical channels (i.e. a device may record pressures on physical channels 1,2 and 3 but the customer sees these Pressures as upstream, inlet and control-space). Thus there needs to be a consistent channel mapping for whatever devices sends us the data

  • Customers and other platform services require data to be aggregated and most importantly time-synchronised to allow accurate control models and graphs to be produced

So in fact in this use case, i2O considers its data as time-dependent and not just time-series. This fact requires us to track the time-dependent history of:

  • The network topology of the water network (Areas, Locations and Assets)

  • Devices and their channels (primarily time series data)

  • An audit of all the Events generated in our SaaS platform

i2O uses Cassandra as it’s primary datastore for time-dependent data but it also makes use of a traditional relational datastore (PostgreSQL) and a document datastore (ElasticSearch) when the data models have a more natural fit for those stores. An example of this is relevant to this use case because the current topology of the network is held in PostgreSQL as it allows very rapid topological searching and querying, but the time history of the network is held in Cassandra to facilitate reconstructing it rapidly to allow us to work out where in the network the arriving data should go (according to the timestamp on the data). OUr older platform used MS SQL as it’s only datastore and this was not suitable for the large amount of time series and time dependent data we were going to get as our business expanded, hence why we selected Cassandra to handle this data.

We make use of ElasticSearch as a datastore in our platform to store resources for our services that are required to be searched for across arbitrary properties. The ability of ElasticSearch to be schema-free and to auto-index the properties of these JSON resources is a very low-friction solution to this requirement. In addition, similar to Cassandra, the multi node distributed nature of the datastore was another attraction to our scalability needs.

Let’s take a look at some of these data models then in Cassandra. In our production system we are presently using Cassandra 2.0.6 for reference. We shall present them in CQL data modelling declarative form for clarity.


Object Tracking over Time

This table tracks our network objects over time and when we receive data from a device which spans a particular time range, we use this table to rebuild the network topology (or topologies) that existed throughout that time range. The type_id is a string representation of what part of the network it represents (Area, Location, Asset or indeed even an Alarm State). The node_id is our internal uuid for this object which is common across all our datastores where we refer to anything about this network object. The channel is the logical channel (if relevant, say in the Alarm State case) and when is the UTC milliseconds since Epoch time that applies to the object_state, which is a case specific blob of data.

We query this table in reverse time order as we need to find the state as was at our time under consideration so searching backwards is more sensible as the network changes are generally more recent than the data measured time.


Main Time Series Table

This table holds both the raw time series data for our devices and also the data aggregations which are used for both graph display (along with downsampling techniques based upon the user’s browser window size), and for different downstream analyses for which the aggregated data is sufficient. The type_id, node_id and channel are the same as in the previous table. The aggregation_level is a string representation of the time resolution (or quantization e.g. ‘100ms’, ‘15min’ and so on – we support up to 9 quantization levels currently). The column when_measured is obviously when the device measured the channel data (again in UTC milliseconds since Epoch time). The remaining columns are the measured value (measurement) alongside any available statistical data that the device has been configured to record (max, min, sample_count and sum_of_squares) which allows us to see any volatility in the channel. This is very useful for some of our algorithmic developments which can detect poorly performing aspects of the customer’s network.

When our platform receives the data from the devices it uses the first table to determine which on the network nodes is the correct node to assign the raw data to in the second table across the whole time-range in the received data. Our platform also then (re)computes the aggregations that are relevant to the time-range of the received data and stores them back in this table. They need to be recomputed when gaps in the time series are backfilled due to poor communications and thus higher time level aggregations which are in this table need to be replaced. Of course this produces tombstones in Cassandra but these are removed at the next tombstone compaction. We monitor this from time to time to determine if we need to adjust any of the table properties so as not to hit an issue in reading large number of tombstones in our time-sliced queries. The gaps in time series data which we require the device to fill is an example of simple tabular data which we hold in our relational store.

Audit Replay

This table holds all the Events (which we refer to as Enterprise Events) that have occurred in our SaaS platform ecosystem since it’s inception.  These events are serialised using a binary compatible mechanism which allows deserialisation into other programming languages than that which serialised it, thus facilitating a new service to be written in a language which might fit better the service’s needs and design. The new service then announces its presence in the ecosystem and requests an event replay of events of certain types that it is interested in from a given moment in time (usually Epoch).

This table’s design threw up some interesting challenges, namely

  • Should we avoid using an IN clause when querying the data for many event types at once – which with a QUORUM read could hit many nodes

    • What does this imply about how we can query and re-assemble the data back into strict time order across event types?

  • What time shard/quantization level should we record the events at which allows for growth in our platform and trades off size of row versus multi-row queries

  • What advantages will there be as we upgrade to a later version of Cassandra and possibly make use of DateTieredCompactionStrategy?


In the end we have settled on the above “schema” and addressed the above by

  • Query asynchronously in parallel for many event type and insert them into a thread-safe sorted list in our client side (C#) code.

  • Use the time-shard of 1 day – i.e. each row contains all the daily events of a particular type which we deemed to be granular enough to allow a growth factor of 100 to now and to not grow the row size (due to the blob) to be intractable without paging yet.

  • For this table there would appear not be a huge benefit of that compaction strategy but for some of our duplicated data tables around the device data there very well could be, as we are most interested in data from the last 90 days.


Some Facts on i2O’s Cassandra Usage

Version in Production


Number of Nodes

3 [will grow to 6 in 2015]

Data Volume

300Gb [but growing daily by about 2Gb]

Audit Replay Events

Around 50,000 per day

Raw time-series data

Over 2 million rows of data currently across tenants