Julien Anguenot (@anguenot) is an open-source advocate and veteran Python and Java developer.
He is now serving as the director of software engineering at iland internet solutions where he is leading, on the one end, the development of a Java EE distributed platform running on top of Cassandra, connecting VMware and OpenStack services across multiple data-centers and on the other end, the customer-facing iland cloud ECS portal app.
With data centers in the U.S., U.K. and Singapore, iland (@ilandcloud) delivers proven enterprise cloud solutions that help companies do business faster, smarter and more flexibly. Unlike any other provider, iland’s technology and consultative approach mean anyone–regardless of expertise, location or business objective–can experience the benefits of a hassle-free cloud. From scaling production workloads, to supporting testing and development, to disaster recovery, iland’s secure cloud and decades of experience translate into unmatched service. Underscoring the strength of its platform, the company has been recognized as VMware’s Service Provider Partner of the Year, Global and Americas. Visit www.iland.com.
Below is a compilation of my notes as well as some thoughts about the Cassandra summit 2014 conference day I attended last week in San Francisco.
Billy Bosworth’s (@BillyBosworth) Opening Keynote
The conference day started with the opening keynote of Billy Bosworth, CEO of DataStax. I was especially looking forward to it since DataStax just announced a $106M series E round of venture capital the week right before the conference. Billy did not mention anything in particular about this new round of funding but rather spoke about the evolution of internet usages (Internet of things, multi-devices, mobility, etc.), more specifically in the enterprise, and, as a consequence, the insane pace at which data are now being generated, growing and the increasing need to store, process and access these data everywhere, any time and with low latency.
Evidently, according to Billy, Cassandra is the perfect match for such new usages which are the needs of enterprise big data applications, and Cassandra will disrupt the old traditional (relational) database market lead by Oracle: DataStax is after a chunk of Oracle’s enterprise market, always has been, and this is what is exciting about DataStax and Cassandra it you ask me.
DataStax customers were also invited to speak about their respective use case and success story: Jeff Ludwig, VP, Network Platform Data and Engineering at Sony Network Entertainment and Yi Li, CEO at Orbeus who presented, with an actual live demo, their cloud-based visual computing solution for recognizing faces, scenes and objects.
The keynote left attendees with the clear feeling that we were now in the era of enterprise big data applications and that DataStax and Cassandra not only have a solid footing in it but that they are leading this new era.
No new service, initiative or partnership announced: I guess I was just curious to hear about what DataStax would be up to with their new round of venture capital and was especially wondering if some kind of Cassandra As A Service offering were about to be launched.
Jonathan Ellis’ (@Spyced) Technical Keynote
Following Billy opening keynote, Jonathan Ellis, CTO of DataStax and Apache Cassandra lead took the stage for a technical keynote.
As anticipated, Cassandra 2.1.0 release was officially announced. Jonathan described what were the new features and improvements coming with this new release. Jonathan highlighted major performance improvements: despite CQL3 benchmark numbers being better than with Cassandra 2.0, the most interesting thing IMHO, is the fact that performances were now more consistent over time thanks to incremental repairs and anticompaction. Jonathan described the process to migrate to new SSTables and enable incremental compaction which is something I am in a hurry to try out on our development cluster here.
New features in Cassandra 2.1 includes: User-defined types, indexes on collections, better and “non-buggy” implementation of counters.
Performance improvements in Cassandra 2.1 includes: faster reads and writes, improved row cache, off-heap memtables, compaction improvement, incremental repairs and better bootstrapping of new nodes (which I am in a hurry to try out as well)
Jonathan mentioned Windows support being in beta with 2.1 and that they are targeting with 3.0. I guess this is quite important at this point considering the DataStax enterprise focus highlighted during Billy’s keynote.
Jonathan mentioned as well the anticipated stability of the 2.1 release and the fact that DataStax has now a dedicated team of engineers doing QA against base Apache Cassandra.
The keynote was great but a tentative roadmap for Cassandra 3.0 and / or new DataStax products would have been interesting.
Breakout Sessions I Attended
The breakout sessions were organized in 6 tracks running in parallel with more than 60+ 45 minutes sessions during the day.
I deliberately forced myself to avoid the overcrowded use-case sessions presented by the big names (Netflix, Apple, Sony, eBay, etc.) to focus on smaller and more technical sessions around operations and developments. We will definitely be able to read about big name use-cases online later anyway.
Diagnosing Problems in Production by Jon Haddad and Blake Eggleston
Jon and Blake did a great job at presenting tools and process to diagnose and solve production and performance related problems with a focus on the JVM garbage collector. They mentioned quite a few OS level tools as well as tricks to monitor and debug a Cassandra Cluster. What was especially great about this talk is that although the speakers mentioned DataStax OpsCenter, they did not focus on it. I am not going into details about the tools they mentioned here as their slides will be available soon.
TitanDB – Scaling Relationship Data and Analysis with Cassandra by Matthias Broecheler
This talk was about storing relationship data in Cassandra using TitanDB.
TitanDB property graph data model on top of Cassandra is well designed and allows the definition of formal and clean relations and allows one to take advantage of a more sophisticated query mechanism relations centric. It also adds an abstraction at the data model level that comes with some significant complexity on the data model itself. I am ensure moving that complexity from the application level will have such clear benefits for developers, as advertised, because it will make their job much harder forcing them to work with the TitanDB paradigm away from the Cassandra simpler data model currently rather easy to program.
Reading Cassandra SSTables Directly for Offline Data Analysis by Ben Vanberg
Great sessions from Ben Vanberg on how FullContact is performing analytics MapReduce jobs against large SSTables for downstream analytics using their own input splittable format: they use couple of Netflix open source tools in addition to their Hadoop SSTable implementation: https://github.com/fullcontact/hadoop-sstable
Streaming From Backups – Reducing Cluster Load When Adding Nodes by Ben Bromhead
Was really looking forward to this talk having faced issues bootstrapping new nodes against large cluster in the past. Unfortunately, this talk was only about benchmarks, explanations and brainstorming about their attempts and ideas making streaming from backups work. I knew I should have gone to the Gossip session that was running at the same time…
Apache Spark – The SDK for All Big Data Platforms by Pat McDonough
Interesting introductory talk about Apache Spark. Just what it took to “sparkle” interest about Spark and its Cassandra as well as Hadoop integration. Will definitely be looking into this for iland.
Monitor Everything! by Chris Lohfink
One of my favorite breakout session of the day: mostly a walk-through key Cassandra performance metrics and tools including JMX and JVM GC. Definitely check the slides for details when available.
Interactive OLAP Queries using Apache Cassandra and Spark by Evan Chan
Great complimentary talk to the Spark introductory session I attended earlier. Loved Spark as an in-memory cache on top of Cassandra.
Cassandra in Large Scale Enterprise Grade xPatterns Deployments by Claudiu Barbura
Left this session after 15 minutes mostly because I needed a coffee break 🙂
Down with Tweaking! Removing Tunable Complexity for Cassandra Performance and Administrator by Don Marti, Glauber Costa and Dor Laor
The title of this session was a bit misleading: it was about OSv which is an optimized operating system for the cloud. Great session but not really specific to Cassandra except for the Cassandra OSv image they used to demonstrate lighting fast boot time of OSv images. I found the presentation was fun and very interesting covering the challenges involved in designing such OS. More about OSv here: http://osv.io/
Lightning Talks and Summit Closing Reception
The conference day ended up with a dozen 5 minutes long lighting talk sessions while the conference attendees were enjoying food and beverages during the closing reception. Definitely a good way to wrap up the conference day. Have to admit I did not really follow that much any of lightings talks being busy talking with people and enjoying refreshments…
Final Thoughts About the Conference
This was my first Cassandra summit and I have to say I have been rather impressed by DataStax’s organization: everything, was perfectly planned and well organized. And you have to remember that it it was a free conference (I purchased a VIP ticket at $99 for priority seating…) Welcome and Closing ceremony were awesome too.
They were around 2200 attendees this year. The Westin St Francis is a perfectly located but rather older hotel which is not necessarily well suited for conferences with sessions on multiple floors and narrow corridors starting with this amount of attendees.
It was definitely great to meet face to face with all the people at DataStax with whom I have been in touch by email over the past year.
Had the pleasure to talk briefly with Jonathan Ellis who not only is a great Cassandra project leader but a great French speaker too!
As a final note, I especially loved the feeling of this conference: it was a true tech conference without excessive commercial and marketing displays for tech people. (or maybe was it because I attended VMWorld the week before?)
Big up and thanks to DataStax for a great conference!