November 4th, 2014

Earlier this year, WibiData announced our partnership with DataStax to integrate our modular, open-source, Apache 2.0-licensed framework Kiji with Cassandra. At Cassandra Summit, I talked about how Kiji lets Cassandra developers build real-time big data applications like product and music recommenders or personalized search engines.

To describe one use case, if Jane is shopping online for an iPad, the site she shops on could use Cassandra and Kiji to record when she adds an iPad to her shopping cart and immediately recommend earphones or a case. The application could also take into account that Jane usually sorts her searches by price, from low to high, and recommend the cheapest items.

To build a system this scalable, quick-responding, and flexible, we need a way to store data about how a user interacts with an application. Cassandra is one option that enables us to store high-volume, high-velocity time-series data. Then, we need three functionalities.

  1. A way to get data in and out of Cassandra
  2. A way to inspect and train data
  3. A way to score and apply models in real time

The Kiji framework combines all of these functionalities in one package so that developers building applications on Cassandra don’t have to start from scratch. KijiSchema provides a layer of functionality on top of Cassandra with a row for every user and columns with time-series data about user interactions in each row. Our Cassandra Bento Box, downloadable here, includes all the elements needed to get data in and out of Cassandra via KijiSchema, inspect and train the data, and score and apply models in real time.

  1. To get batch data in and out of Cassandra, KijiMapReduce bulk-loads data from logs and databases into the storage system. For real-time data, KijiREST provides a restful API to write data into Cassandra and read it back in real time.
  2. To inspect and train data, KijjiHive lets data scientists use the familiar language HiveQL to query the customer data in KijiSchema. KijiMapReduce provides a library of machine-learning and predictive models for Hadoop.
  3. To score and apply models in real time, KijiScoring applies machine-learning models over customer information to produce results such as recommended products.

We originally built Kiji on HBase, but as our experience at Cassandra Summit made clear, Cassandra is taking off for a number of industries and use cases. If you’d like to try out the software for yourself, you can access the open-source code for Cassandra Bento Box on GitHub, along with a tutorial for building a phone book on top of Cassandra with Kiji. Happy coding!

If you enjoyed this summary of “Building Personalization Apps on Cassandra” from Cassandra Summit 2014, be sure to check out the agenda and register for Cassandra Summit Europe 2014, hosted in London, UK.  Additionally, you can find all presentations from Cassandra Summit 2014 on the Planet Cassandra YouTube Playlist.

e50ca72c4d224f7e9a3d5926a3eb7dff (1)