September 30th, 2013

By 

“The other technologies we looked at were other column stores, both open source and commercial, and by far and away Cassandra had the best reputation and had the best performance for the testing.”

-Mike Williams, Software Director at i2O Water

Mike Williams

Mike Williams Software Director at i2O Water

 

 

TL;DR: i2O Water provides intelligent solutions for water utility companies around the world, to help them reduce the pressure in their water networks.

 

They use Apache Cassandra as a predominant column store, for time series data within their solution. They record time series data from multiple physical channels from their devices out in the field, over the GPRS mobile phone network through to the Internet.

 

Prior to using Cassandra i2O Water had a traditional analysis technology, using Microsoft SQL Server. They used a rather slow architecture and changed the platform over to using an event driven architecture.  i2O Water currently has around 1.5 terabytes of data in their old platform, which they’ll be moving over to Cassandra soon.  They have a hosted deployment by RackSpace; i2O Water is a software as a service (SaaS) solution that is provided to water utility companies.  They have a virtualized environment comprising of about 16 virtual machines of various flavors that run our system.

 

Today we have Mike Williams of i2O Water joining us. Mike, thanks so much for taking the time; to get things started, could you tell us a little bit about what i2O Water does and what your role is?

i2O Water provides intelligent solutions for water utility companies around the world, to help them reduce the pressure in their water networks, to save water. This helps them reduce leakages and bursts on their water network. We currently save over 100 million litres of water per day for our customers across the world.

 

That’s excellent; it sounds like a very sustainable mission for you and a great use of technology. What is your role at i2O Water?

I’m the software and IT Director of i2O Water, so my responsibility is for leading the teams that write the software for our intelligent devices, both the embedded software and platform software. Cassandra plays a role that keeps the data from our devices that interact with the Platform and the intelligent algorithms run against.

 

Excellent. How does Apache Cassandra fit into the mix there at i2O Water?

We use Apache Cassandra as our predominant column store, for time series data within our solution. We record time series data for multiple physical channels from our devices out in the field, over the GPRS mobile phone network through to the Internet.

 

We also record how the water company’s network topology changes over time, so that it evolves as new zones and devices are added and created. In addition we also store large amounts of spot events over time such as, alarms, pieces of equipment that are going faulty, and so on.

 

And what was your motivation for using Apache Cassandra? Were there any other technologies that it was initially evaluated against, before making your decision?

Prior to using Cassandra we had a traditional analysis technology using Microsoft SQL Server. We used a rather slow architecture and we changed the platform over to using an event driven architecture. We were getting more successful with our business so we were getting more and more of this time series data. We really looked for a store to optimize the storage and retrieval of the time series.

 

The other technologies we looked at were other column stores, both open source and commercial, and by far and away Cassandra had the best reputation and had the best performance for the testing that we did.

 

Very cool. Would you be able to share some insights into what your deployment looks like?

We have hosted deployment by RackSpace; it’s a software as a service (SaaS) solution that we provide to the water utilities.  We have a virtualized environment comprising of about 16 virtual machines of various flavors that run our system. We have several virtual servers running Apache Cassandra as individual nodes and also have three nodes in our production system.

 

Do you know the over-all amount of data that you’re storing in Apache Cassandra?

Presently we’re migrating our customers off our old platform to the new one, so what we store today on Cassandra’s not really representative of what we need to store. We currently have about 1.5 terabytes of data in our old existing platform, which we’ll be looking to move over soon.

 

That’s great. How’s the migration going for that?

It’s customer specific so it’s fine in terms of our architecture that uniquely supports the ability for us to replay history.  We effectively replay the events that occurred in the past, as though the devices were talking to the new platform. We’re not actually doing a data migration as such, from database to database technology.

 

Interesting. That’s a really unique way of transferring the data over, by recreating it; that’s cool.

Our architecture is built around the ability to add services to the ecosystem whilst the ecosystem is running, but also, to pretend they have been there right from the beginning of time, thus catch up and replay anything that they may have missed.

 

Is there anything that you’d like to see out of future versions of Apache Cassandra that would help you out?

To be honest, not a lot really. In the more recent versions there’s a lot more support for time series data, which is our predominant use case. We really like the addition of CQL3 and we’re making heavy use of that.

 

We’re going to be switching over very soon to using the DataStax client for our C# code, which speaks CQL 3 natively across the interface, rather than using the Thrift protocol. Really a lot of our sweet spots have been met by the latest development.

 

Excellent. What’s your experience with the Apache Cassandra community, whether it be the virtual community or the local physical community?

Over here in Europe it’s a bit of a slow start on Cassandra, it’s just kicking off now really. DataStax has opened an office here, which is helping to promote that community, there’s more community events sparking up with meet-ups, which is useful. There’s none actually in our local area here in Southampton, there are some in London which we’ve gone to. Of course there’s a Cassandra Summit  EU conference occurring next month, which myself and a couple of our developers will be attending, too. As far as the on-line community’s concerned,  it’s been really great. We’ve have quite a lot of help, in terms of asking questions on forums and getting responses.

 

I’m looking forward to meeting you out there. I think that’s really all the questions that I have for you today. It was really great hearing about how you’re using Apache Cassandra at i2O Water. Is there anything else that you’d like to add before we sign off here?

No, I just encourage people to take a look at our website and look at what we do, because I think we’re helping the world save water, which is a very laudable target for a  innovative and commercial business. We’re using quite a lot of innovative technology to help us do that, of which Cassandra plays a major role.