For the first time, this year’s Cassandra Summit presentations span two days and there are too many great talks for me to narrow it down to my traditional top ten list. Instead, I’ll highlight the most exciting talks for the beginner, intermediate and advanced audiences.
Talks that newcomers to Cassandra will learn the most from include:
Presentations from Daniel Chia (Coursera), and Peter Connolly (Macy’s) on their respective experiences migrating from legacy relational databases to Cassandra:
Daniel Chia, Software Engineer, Coursera, Inc.: Coursera’s Adoption of Cassandra
Like many startups, Coursera began its data storage journey with MySQL, a familiar and industry-proven database. As Coursera’s user base grew from several thousand to many millions, we found that MySQL provided limited availability and restricted our ability to scale easily. New product initiatives and requirements provided a perfect opportunity to revisit our choice of core workhorse database.
After evaluating several NoSQL databases, including MongoDB, DynamoDB and HBase, we elected to transition to Cassandra . Cassandra’s relative maturity, masterless architecture (for availability), tunable consistency, and stable low-latency performance made it a clear winner for our needs.
Peter Connolly, Senior Architect, Macy’s, Inc.: Changing Engines in Mid-Flight
This presentation recounts the story of Macys.com and Bloomingdales.com’s migration from legacy RDBMS to NoSQL Cassandra in partnership with DataStax.
One thing that differentiates this talk from others on Cassandra is Macy’s philosophy of “doing more with less.” You will see why we emphasize the performance tuning aspects of iterative development when you see how much processing we can support on relatively small configurations.
This session will cover:
- The process that led to our decision to use Cassandra
- The approach we used for migrating from DB2 & Coherence to Cassandra without disrupting the production environment
- The various schema options that we tried and how we settled on the current one. We’ll show you a selection of some of our extensive performance tuning benchmarks, as well as how these performance results figured into our final schema designs.
- Our lessons learned and next steps
Aaron Ploetz’s talk on distributed data modeling:
Aaron Ploetz, Lead Database Engineer, AccuLynx: Escaping Disco-Era Data Modeling
Building high-performing Cassandra data models requires a query-based approach. However most of us were taught to build relational, normalized data models, which do not work well with Cassandra. Poor performing data models are often built with the idea of storing data efficiently, and then showered with secondary indexes to serve the required queries. Isn’t it time that we learn how to build 21st century data models, without using 1970’s techniques?
Rene Antunez on learning Cassandra administration as an Oracle DBA:
Rene Antunez, Cassandra DBA Team Lead, The Pythian Group: My First 100 days with a Cassandra Cluster
With Apache Cassandra being a massively scalable open source NoSQL database and with the amount of data that we create and copy annually which is doubling in size every two years, it is expected to reach 44 zettabytes, or 44 trillion gigabytes, we can assume that sooner or later a DBA will be handling a Cassandra database in their shop. This beginner/intermediate-level session will take you through my journey of an Oracle DBA and my first 100 days of starting to administer a Cassandra Cluster, show several demos and all the roadblocks and the success I had along this path.
Nate McCall’s talk on Cassandra security:
Nate McCall, Co-Founder, The Last Pickle: Hardening Apache Cassandra for Compliance (or Paranoia)
Every Apache Cassandra installation needs to be secured, either for compliance or security reasons. Out of the box, Cassandra is an open system, free from authentication, authorization and encryption. With little additional effort, however, it can be secured to meet most regulatory and security requirements.
In this talk Nate McCall, Co-Founder at The Last Pickle, will explain how to implement inter-node and client-server SSL Encryption, Client Authentication and Authorisation, Internode Authentication, and JMX security. While few people pose a deep understanding of security, everyone should know how to implement the basics for Apache Cassandra.
Ben Slater on when and how to upgrade your application from relational to Cassandra:
Ben Slater, Chief Product Officer, Instaclustr: When and how to migrate from a relational database to Cassandra
Many applications are initially developed using relational database technology before upsizing to Cassandra as they mature.
This presentation will examine the indicators that it is time to start considering a migration away from relational and factors to consider when making the decision to move. We will then discuss different approaches for implementing the migration and how to plan, estimate and manage the work along with key risks and gotchas you may encounter.
Jason Kusar’s talk on dealing with latency in a globally distributed cluster:
Jason Kusar, Senior Software Engineer, Vistronix: Global Deployments with Bad Comms
We have been running a truly global deployment for 3+ years now. With datacenters in D.C., England, Europe, and Japan, we would be hard pressed to span more distance without visiting Antarctica. Bandwidth in and out of our datacenters varies greatly, as does latency. Comms are not always 100% reliable either. This was one of the main things that led us to choose Cassandra. This talk will cover many of the lessons learned and best practices for both preventing issues and recovering from them in the real world.
Finally, Marcos Ortiz on Cassandra in a .NET environment is timely given the new level of support for Windows in Cassandra 2.2:
Marcos Ortiz, Open Source Advocate, UCI: Inserting Apache Cassandra in a .Net environment
It highlights how my team migrated from SQL Server to Cassandra in a .Net environment, and how we dealt with code refactoring, Cassandra optimizations in Windows Servers, and more tips.