May 5th, 2014

Running Real-Time Queries with Spark and Shark on Top of Cassandra Data” was presented by Evan Chan, Software Engineer at Ooyala, at Cassandra Day Silicon Valley, and as part of Hakka Labs’ Cassandra Week.


Evan Chan (Software Engineer, Ooyala), describes his experience using the Spark and Shark frameworks for running real-time queries on top of Cassandra data. He starts by surveying the Cassandra analytics landscape, including Hadoop and HIVE, and touches on the use of custom input formats to extract data from Cassandra. Then, he dives into Spark and Shark (two memory-based cluster computing frameworks) and explains how they enable often dramatic improvements in query speed and productivity.