November 11th, 2013

By 

 

 

Jesse Young: Vice President of Software Development at Zonar Systems

Josh Hasnen: System and Application Architect at Zonar Systems

Matt Pfeil: Co-Founder at DataStax

 

TL;DR: Zonar Systems offers a safety inspection system and telematics for heavy fleet vehicles. Ultimately their offering is GPS tracking of vehicles that weigh over 10,000 pounds or carry more than 8 passengers.

Zonar collects data around every 18 seconds; as you imagine, it stacks up pretty quickly across 300,000 vehicles. They’re running anywhere from 6 to 12 hours a day, so do the math. It’s quite a bit of data.

Many different factors motivated them to start looking for a better solution. Zonar knew that they had a lot of data that they were potentially starting to store. They needed to be able to quickly expand storage and store this data in real-time without any bottlenecks. From a system’s approach, they also needed a system that had built-in multi-data center replication.

Hi Planet Cassandra users, this is Matt Pfeil from DataStax. Today I’m here with Josh and Jesse from Zonar for today’s Apache Casssandra Use Case. What does Zonar do and what are your roles there?

Josh:   I’m Josh Hansen. I’m the System and Application Architect.

 

Jesse: I’m Jesse Young. I’m the Vice President of Software Development. At Zonar we offer a safety inspection system and telematics for heavy fleet vehicles. Ultimately our offering is GPS tracking of vehicles that weigh over 10,000 pounds or carry more than 8 passengers.

 

We make sure sure users know exactly where those vehicles are going and collect a lot engine diagnostics information, so that we know exactly what’s going on with the vehicle in real-time.

 

You’re doing a ton of fleet logistic tracking, which is great use case for something like Cassandra. What is your specific use case for Cassandra at Zonar Systems?

Jesse:  Today we’re tracking over 350,000 vehicles across the United States and Canada; we’re quickly growing and expect to be at 500,000 devices by the end of next year. We’re a leader in our industry space.

 

We’re over 100TB in our data stores right now. We’ve maxed out our RDBMS solution and have been looking at how we can quickly store data and retrieve it as fast as possible.

 

Is the vast majority of that 100TB all GPS-type data?

Jesse:  Most of it.  We get very heavily into GPS data, but we’re also collecting a lot of information off of the engine computer itself such as oil temperatures, cooling temperatures, cruise control state, fault code, check engine lights, and stop engine light information.

 

That’s awesome. How often do the devices send out information?

Josh: We collect data around every 18 seconds; if you imagine that, it stacks up pretty quickly across 300,000 vehicles. We’re running anywhere from 6 to 12 hours a day, so do the math. It’s quite a bit of data.

 

What was your original motivation for looking at alternate technologies to a relational system?

Jesse:  Many different factors motivated us to start looking for a better solution. Again, we really knew that we had a lot of data that we’re potentially starting to store. We needed to be able to quickly expand our storage and store this data in real-time without any bottlenecks.

At the same time our users require us to report on that data very quickly and we didn’t really have the desire to have both OLTP type databases and data warehousing, as they became very expensive.

 

From a system’s approach, we needed a system that had built-in multi-data center replication.

 

What information can you share about what your current infrastructure looks like around Cassandra?

Jess: We’re still fairly private with that, but we are running multiple data centers and leveraging virtual private cloud providers.

 

Awesome. Other than GPS information and time series data, are there any other use cases inside the environment that is utilizing Cassandra?

Josh: We plan on supporting elevation data. We have a digital elevation model that we received from USGS that we store in Cassandra as well. We tried using it in the relational system, but it pretty much fall on its face. Cassandra was a perfect fit.

 

When you said that it “fell on it’s face”, what’s the primary issue there? Is it write throughput? Help me understand that a little more.

Josh:  The volume of data, on a single node, that you get in a relational system was the issue. We would have to scale it out similar to how Cassandra scales, and shard it, and all that stuff.

If you’re running all that code, it makes sense to adopt a system that does all that for you.

 

I completely agree. Guys, for my last question, what’s your favorite thing about Cassandra?

Josh: My favorite part has to be the scaling aspect, to be honest. It’s so much easier working with a whole cluster of nodes that’s one big mesh. In our Postgres systems you have to work on them individually; if you want to run any jobs you have to connect to each one, one by one around the job, wait for it to finish and then go to the next.

Cassandra just gets rid of a lot of that and lets us hit the cluster and use it that way.

 

Jesse:  The performance is amazing. One of the things that I love about it too is the community; there’s a giant field of experts out there that are willing to help people for free, whether it be on Twitter, IRC, Planet Cassandra or all the meetups happening or even the summit events.