Robert Harvey: Chief Architect at Third Iris.
Robert, can you please explain what service Third Iris provides its customers?
We provide camera surveillance as a service without the need for DVRs or NVRs, and that’s really one of the key distinctions of our service. The system runs in the cloud and the user ends up not having to worry about the security or the safety of the video that they’re recording.
We include monitoring and the whole list of features that come with the fact that the data is managed through the cloud and our online system. We also manufacture our own cameras, which is pretty unique among companies in our space. We have a wide variety of customers from global enterprise to SMBs. We tend to highlight our success in education, state, local, retail (including regional and national chains) and property/facility management.
What does your infrastructure look like?
We run 100% in Amazon and have 5 years’ worth of cameras to maintain through many, many different architectures. We’ve been working towards consolidating to the most reliable systems we can get, and it’s critical that we have a reliable database. Cassandra just fits in every way there.
Do you run across multiple availability zones on Amazon or multiple data centers?
We’ve designed for that, although currently we’re not at that point. Depending on how quickly we grow, we’re definitely looking towards going to the AWS West Coast zone.
What do you store in Cassandra? Do you store the video footage itself or metadata about it?
We’re just storing the metadata about the video, but throughout the day the camera may generate an hour to an hour-and-a-half of video, since it’s motion-activated. It sends the video to us as efficiently as we can make it work with the network. So we tend to get those in chunks of 2 to 5-minute video lengths, and we store everything about those clips in Cassandra. Cassandra has been spot-on because it’s as easy as it could be to get it running.
How many nodes do you operate and how much data do they store?
We’re only at 4 nodes right now and we’ve got about 6-months’ worth of timeline data that’s been through DataStax and then Cassandra.
You said the need for a reliable data store caused you to seek out Cassandra. What technology were you using before and what business or technical problems caused you to look at NoSQL, and then specifically Cassandra?
We operated an intermediate architecture that used specially built servers, which handled all the indexing, and storage of our assets, including all the video and thumbnails. It worked great with just a handful of cameras, but there wasn’t really enough testing conducted before it went into production. But as soon as we got into relatively low numbers, still in hundreds of cameras, the system started to fall down. Mostly it fell down on the indexing side and that was made about as efficiently as could be done with that approach.
So we replaced that system with Cassandra for storing all of that stuff. And we’ve got a very interesting data storage scenario because we write probably 500 to 1000 times as much as we read data. Because people don’t worry about looking at videos until something goes wrong, and that happens very rarely.
We wanted to provide a really responsive feedback to the customer without needing to build a really responsive server to handle the importing. We really wanted to separate the reads and the writes as much as we could. And that was our biggest headache with that first system. It was just a nonstarter, really.
So you needed something that had a very high rate of write speed and could easily do time-series analysis, which again, Cassandra performs very well. Are there other attributes of Cassandra that really make it a winner for your particular application?
We’re a pretty agile team, so having a schema-less design really worked well for us because we weren’t boxed into a decision that might have been made 6 months prior. Obviously from the start we were looking at the multiple availability zones and how that would work with replication and that did really well. It was mostly performance and reliability, and secondarily I think training, since none of us were even database experts, let alone Cassandra or NoSQL experts, at that point.
You ultimately chose DataStax Enterprise versus open source Cassandra. What were the factors that caused you to go with it over using pure open source?
We selected it because of the training, support and cost. When you consider the amount of money it cost to keep a relatively small number of nodes going for the foreseeable future, it was really not that big a cost compared to the potential problems we’d be facing in trying to get somebody onboard to solve them once we ran into those problems. So it was not exactly a heavy decision to make at the time and so far it is paying off just fine.
How do you manage everything? Do you use DataStax OpsCenter or something else?
Yes, I use OpsCenter a lot. It has really worked well for monitoring the deployment. I’m brushing up on my use of command line tools but I almost always start with OpsCenter and go from there.
What advice would you offer other users who are moving to NoSQL solutions for the first time?
I guess it would depend on how comfortable the person was with experimentation and their timeline. Obviously in our situation it worked really well to use DataStax for getting up and running. We’ve used open source projects for just about everything and I’m kind of used to being a jack-of-all-trades for getting things done. If somebody had that approach and the time to learn Cassandra then they could start off just trying to get away with the open source community. But if there is a need for getting there quickly and making sure you’re getting the right solution, I would recommend that they use DataStax.