April 9th, 2015

Brian O’Neill, Chief Technology Officer at Health Market Science
Brian is Chief Technology Officer at Health Market Science (HMS) where he heads development of their data management and analytics platform, powered by Storm and Cassandra.  Brian won InfoWorld’s Technology Leadership award in 2013 and authored, Storm: Blueprints for Realtime Computation.  He has a number of patents and holds a B.S. in C.S. from Brown University.

Our new parent company, LexisNexis, has one of the world’s largest public records database:

“…our comprehensive collection of more than 46 billion records from more than 10,000 diverse sources—including public, private, regulated, and derived data. You get comprehensive information on approximately 269 million individuals and 277 million unique businesses.”

And they’ve been managing, analyzing and searching this database for decades.  Over that time period, they’ve built up quite an assortment of “Big Data” technologies.  Collectively, LexisNexis refers to those technologies as their High-Performance Computing Cluster (HPCC) platform.

HPCC is entirely open source:

Naturally, we are working through the marriage of HPCC with our real-time data management and analytics stack.  The potential is really exciting.  Specifically, HPCC has sophisticated machine learning and statistics libraries, and a query engine (Roxie) capable of serving up those statistics.

Low and behold, HPCC can use Cassandra as a backend storage mechanism! (FTW!)

The HPCC platform isn’t technically supported on a Mac, but here is what I did to get it running:

HPCC Install

Clone the github repository, and its submodules (git submodule update –init –recursive)

Pull my patches (https://github.com/hpcc-systems/HPCC-Platform/pull/7166)

Install the dependencies using brew:

Make a build directory, and run cmake from there:


Then, compile and install with (sudo make install)

After that, you’ll need to muck with the permissions a bit:

Now, ordinarily you would run hpcc-init to get the system configured, but that script fails on OS X, so I used linux to generate config files that work and posted those to a repository here:


Clone this repository and replace /var/lib/HPCCSystems with the content of var_lib_hpccsystems.zip

Then, from the directory containing the xml files in this repository, you can run:

daserver (Runs the Dali server, which is the persistence mechanism for HPCC)

esp (Runs the ESP server, which is the web services and UI layer for HPCC)

eclccserver (Runs the ECL compile server, which takes the ECL and compiles it down to C and then a dynmic library)

roxie (Runs the Roxie server, which is capable of responding to queries)

Kickoff each one of those, then you should be ready to run some ECL. Then, go to http://localhost:8010 in a browser.  You are ready to run some ECL!


Running ECL

Like Pig with Hadoop, HPCC runs a DSL called ECL.  More information on ECL can be found here:


As a simple smoke test, go into your HPCC-Platform repository, and go under: ./testing/regress/ecl.

Then, run the following:

You should see the following:

        <dataset name=”Result 1″>
        <row><result_1>Hello world</result_1></row>

Cassandra Plugin

With HPCC up and running, we are ready to have some fun with Cassandra.  HPCC has plugins.  Those plugins reside in /opt/HPCC/plugins.  For me, I had to copy those libraries into /opt/HPCCSystems/lib to get HPCC to recognize them.

Go back to the testing/regress/ecl directory and have a look at cassandra-simple.ecl. A snippet is shown below:



In this example, we define childrec as a RECORD with a set of fields. We then create a DATASET of type childrec. Then we define a method that takes a dataset of type childrec and runs the Cassandra insert command for each of the records in the dataset.

Startup a Cassandra locally.  (download Cassandra, unzip it, then run bin/cassandra -f (to keep it in foreground))

Once Cassandra is up, simply run the ECL like you did the hello program.

You can then go over to cqlsh and validate that all the data made it back into Cassandra:
OK — that should give a little taste of ECL and HPCC.    It is a powerful platform.
As always, let me know if you run into any trouble.