Mohammed Guller Application Architect and Lead Developer at Glassbeam
Glassbeam is an machine data analytics company. We provide cloud based solutions for organizing and analyzing unstructured and multi-structured machine data. What I mean by machine data is data that is coming out of connected devices that generate data categorized by three Vs: volume, velocity and variety. These would be devices like storage devices, wireless devices, networking devices, medical devices. These devices generate a ton of data, which is unstructured and multi-structured.
We take all this data and organize and create structure around it so the manufacturers of these products can unlock the value out of this data and gleam valuable insight out of it. They get different pieces of information useful to different groups. For example, one group could be the product support group. They could see exactly the issues a customer is running into and pro-actively address issues even before the customer knows about it.
Product management can also use this same solution to figure out how their product is being used. The product may have ten features, but the customer may be using only two. The other eight are not really getting used. So our solution provides insight about the features that most of your customers are using. A customer can then decide where to invest its limited resources for maximum ROI. Similarly, we unlock value for other groups in a company. In a nutshell that’s what Glassbeam does.
Moving to NoSQL from RDBMS
We look at a couple of other NoSQL databases. In fact, this is our next generation platform that we are building on Cassandra. We already had a solution which we call the current gen, which is based on RDBMS solutions including: SQLite, MySQL and Vertica.
We chose Cassandra for our next generation stack for three reasons:
First reason, is write performance. I earlier mentioned the 3Vs that characterize the data that our customers’ devices are generating. One of them is velocity. Data is coming to us at a very high velocity so we need a platform that can support that velocity. Cassandra is a good fit with its high write performance.
The second reason is the nature of the data we are getting. It’s not just unstructured, but also multi-structured. It’s not like you’re getting same data every time. There’s a lot of variance in the data we are getting. We needed a platform that can store data in dynamic columns and Cassandra is a very good fit for that.
The third reason is scalability. We are getting a huge volume of data, so we need something that can easily scale up and down with the amount of data that we are getting. And again, Cassandra was the perfect fit for that requirement as well.
Cassandra for analytics and primary datastore
Cassandra is our main data store. The unstructured data that we are getting is in the form of files and stream data. We transform that unstructured data into a structured data set. That structured data set gets thrown into Cassandra. So Cassandra stores the raw, but structured, data. We are also using Cassandra as an analytics platform. So all the analytics, and business intelligence will be based on the data that is in Cassandra.
We are building our own analytics and reporting applications. Part of our road map is the capability to allow customers to use standard BI tools.However, these BI tools would not be connecting directly to our Cassandra data store. We are providing an API layer between Cassandra.
We are in the Cloud. We will be deploying it initially on the Amazon Cloud, AWS. Our goal is to allow customers to deploy our solution on other clouds as well. We are building an infrastructure that can easily be migrated from one cloud to another cloud. The initial version is going to be on Amazon.Currently, we’re using standard m1.large and m1.xlarge instances on Amazon. We are doing a lot of performance testing right now and the product is still in development, so the cluster size is not finalized yet.
Bring in the docs
The community is great. I love it. The good thing is the documentation. There is so much documentation already there. I haven’t felt the need to go and ask the community a lot questions. Most of the time, I find that somebody else has already asked the question that I have and it has been answered. I’m pretty happy with the support that the community is providing to people working with Cassandra.
A look back on Cassandra
We’re pretty happy with our decision to go with Cassandra. We decided to go with Cassandra a year ago and we don’t have any regrets – it’s been an amazing experience. We use some other NoSQL products too. As I look at the design, Cassandra’s design compared to some of the other NoSQL databases is better. I think the Cassandra design is elegant. In addition, the pace at which Cassandra has evolved is pretty impressive.
When we started a year ago, there were a lot of things that were missing. Within a year, there’s been so much functionality added, it makes life really easy. So that’s good. The documentation is good. One can easily download the documentation from DataStax’s website. In addition, there are a ton of YouTube videos; the mailing list is pretty active, too. All these things makes it easy to use Cassandra.