March 20th, 2013

Brian O’Neill | Lead Architect



Christian Hasker: Editor of Planet Cassandra, A DataStax Community Service

Brian O’Neill: Lead Architect at Health Market Science


Health Market Science

Health Market Science, a provider data, software, and integration services supplier, provides healthcare companies with better results through high quality market intelligence, industry leading provider verification, and unparalleled data management. Our solutions and services range from pure data delivery to highly sophisticated end-to-end solutions that address business challenges in the area of data management, revenue leakage, and regulatory compliance.  Health Market Science stores over 1.5 petabytes of information in Apache Cassandra!


Christian: Welcome to this edition of the Cassandra community five-minute interview. I am here with Brian O’Neill from Health Market Science, and we’re here at the New York Cassandra Tech Day where Brian is presenting.


Brian: Hi, I’m going to go into our specific use case on how we are using Cassandra to solve Master Data Management and go through our existing stack which comprises Cassandra, Elastic Search, Kafka and Storm.  I will be paired with Taylor Goetz who wrote Storm-Cassandra, which is the bridge between Storm and Cassandra.


Christian: You’ve been using Cassandra for a while now and before that you had a relational database background. So, why Cassandra?


Brian: In our case we are trying to solve the master data management problem. We take in over 2,000 feeds; some of it structured and some of it unstructured data and building that into a relational database you have to build a meta layer to accommodate changes in schema for all those different feeds you are taking in. It’s a much more natural fit to accommodate the variety of data we needed to go with a column-oriented database because if they add a column in their schema we can add a column without any downtime or schema changes behind the scenes. On top of that we knew we would need to do a lot of analytics so data processing was huge. We evaluated the space of available NoSQL database solutions it came down to HBase and Cassandra. Cassandra seemed to have a better code base and community. That’s why we went with Cassandra.


Christian: You’re very prominent in the Cassandra community, giving a lot back. You’ve got a book coming out soon, and you present on webinars and events like today’s. What do you like most about being part of the Cassandra community?


Brian: I think being part of the community has been beneficial for our company as well. For example the trigger capability in Cassandra is coming up, but we needed that capability earlier, so we developed that capability and put it out to the community and people worked with us on it; we got a lot of advice from Jonathan Ellis on how to best design that capability, so the decision to go with Cassandra regarding the community was bi-directional benefit. We like participating but we also get a lot back.


Christian: Excellent. So talk a little bit about Health Market Science. What does the application do for your customers? How do they benefit from all the great stuff you are building?


Brian: This is the benefit of Cassandra. When we were doing this kind of processing  in relational databases we’d get these feeds in and changes in the feeds, and our relational system couldn’t even keep up. So customers would have to wait, sometimes weeks for a change in the data to be reflected in our master file. So we take those 2,000 feeds and generate a master file that has a picture of every healthcare practitioner, every provider, every organization and the affiliation between the practitioners and the organizations. The freshness of that data is critical because our clients are making business decisions off of it.


Christian: How long do they wait now?


Brian: So we took what used to be days’ worth of processing down to minutes. It’s awesome. When you go to a pharmacy and have your prescription filled and the doctor prescribes you that drug, there are compliance ramifications: for example, the pharmacy has to check and make sure that they were eligible to make that prescription. The cost of making an incorrect decision is severe, so when we can get the time window and latency down it is something that’s really valuable to our customers.


Christian: So is that why I wait 20 minutes for my prescription?


Brian: That I can’t help you with.


Christian: Coming from a relational background, a lot of our community is looking at Cassandra for the first time – any tips and tricks for them?


Brian: Sure. The simplest thing for me that helped the learning curve in our organization was not to view it so much as a database and bring your relational concepts with you. It’s much more look at it like a set of fundamental storage primitives, which are very simple, and that’s where the simplicity of Cassandra is really appealing. You can drill down from a keyspace to a table, it has rows and columns, and you can have an arbitrary number of columns. Then the only additional complexity there is I get a lot of power from the way I design and use those columns, and Jonathan is calling them cell names now, to support the types of queries I have coming in. I think you run into trouble when you try to bring in the modeling techniques you used to use in relational and apply them here. Instead just start with your fundamental primitives and design for your application.


Christian: Just think differently. The old Apple adage.


Brian: Absolutely. Jobs had it right.


Christian: Thanks for joining us today Brian; we really appreciate your contributions to the community.


Brian: Thanks