August 14th, 2014

Abstract: Interactive OLAP Queries using Cassandra and Spark 

How do you rapidly derive complex insights on top of really big data sets in Cassandra?  This session draws upon Evan’s experience building a distributed, interactive, columnar query engine on top of Cassandra and Spark.  We will start by surveying the existing query landscape of Cassandra and discuss ways to integrate Cassandra and Spark.  We will dive into the design and architecture of a fast, column-oriented query architecture for Spark, and why columnar stores are so advantageous for OLAP workloads.  I will present a schema for Parquet-like storage of analytical datasets onCassandra.  Find out why Cassandra and Spark are the perfect match for enabling fast, scalable, complex querying and storage of big analytical data.


About Evan Chan, Principle Systems Engineer at Socrata

Evan Chan is a Principle Systems Engineer at Socrata.  In his own words: I love to design, build, and improve bleeding edge distributed data and backend systems using the latest in open source technologies. I am a big believer in GitHub, open source, and meetups, and have given talks at conferences such as the Cassandra Summit 2013 and will be presenting at Cassandra Summit 2014.

Be sure to check out all of the sessions from Cassandra Day Seattle at the Cassandra Day Seattle 2014 YouTube Playlist