Top Posts This Month
Upcoming Webinars
Global Posts

Ruby Driver 1.0 GA release

November 19, 2014

By 

I’m very happy to announce that the DataStax Ruby Driver 1.0 GA for Apache Cassandra and DataStax Enterprise has just been released. It has been an exciting journey, and it is only the beginning, please refer to the complete changelog for details.

Installation

You can install the driver now using RubyGems:

Or Bundler:

Quick start

Here is an quick look at using the driver:

Features

The DataStax Ruby Driver 1.0 for Apache Cassandra and DataStax Enterprise includes the following features:

Compatibility

This driver works exclusively with the Cassandra Query Language v3 (CQL3) and Cassandra’s native protocol, and supports the following software versions:

  • Apache Cassandra 1.2 and 2.0
  • DataStax Enterprise 3.1, 3.2, 4.0 and 4.5
  • Ruby (MRI) 1.9.3, 2.0 and 2.1
  • JRuby 1.7
  • Rubinius 2.2

Useful links

Finally, celebrate this release with a screencast detailing load balancing in the Ruby Driver.

Happy Coding!

Owen Kim Lead Software Engineer at PagerDuty
"We needed to be fault-tolerant to catastrophic regional failures. Cassandra's tunable replication and consistency let us define and implement these policies and be fault-tolerant in precisely how we need to be."
Owen Kim Lead Software Engineer at PagerDuty

PagerDuty is the central hub for on-call and operations dispatch. At its core, it ties together all your monitoring services into one place, manages your on-call schedules, escalation policies, and notification methods and ensures that if something is wrong in your service, the right person gets alerts so they can act quickly to resolve any issues. I personally work on the pipeline of alerts that starts with our monitoring integrations or HTTP and email API and ends with a person getting a call, SMS, email, or push notification.pagerdutyapp

Available alerts

Our business is to alert people when they’re having problems so we need a high standard for uptime of our integrations and deliverability of alerts; there’s little value is an alert service that only works sometimes. So we needed to be fault-tolerant to catastrophic regional failures. Cassandra’s tunable replication and consistency let us define and implement these policies and be fault-tolerant in precisely how we need to be.

Building on stable ground

Cassandra is essentially the platform that we built our alert pipeline on. This pipeline is broken into multiple stages and services, but each is backed by Cassandra so that we can build on the reliability it provides.

Stability is absolutely the top benefit we receive from Cassandra. It’s hard to build a stable service if the bottom of the stack isn’t stable. Cassandra functions as a solid base for our applications.

We evaluated Cassandra 3-4 years ago and found it to be the most mature and most suitable for cross-DC deployments. We’re now using several Cassandra 1.2 clusters, each with nodes, in 3 data center regions. Each cluster is 5-10 nodes with a 2-2-1 RF.

Tips and tricks

Background anti-entropy and load issues can creep up very suddenly and without warning if you’re not looking out for the right issues. Stay ahead of your depleting capacity by scaling in advance, a task that’s relatively easy to do with Cassandra.

I went to the Cassandra Summit in San Francisco this year and was really impressed by many of the speakers there. A lot of speakers were very candid about their experiences and lessons learned which were really useful for my own sake.

Apache Cassandra at Pager Duty: Watching Your Cassandra Cluster Melt

meetup_logo

Navraj (Raj) Chohan Co-Founder at AppScale Systems, Inc.
"Every time someone chose Cassandra when using our data store API, we found it was the easiest option to get going and scale out; its fault tolerance, architecture, and the fact that it was modeled after Big Table gave us confidence to use it as the API's default implementation."
Navraj (Raj) Chohan Co-Founder at AppScale Systems, Inc.

The AppScale Systems, Inc. platform allows developers to effortlessly deploy their App Engine application on any cloud, public or private, within 5 minutes. The company was officially formed in 2012, but started in 2008 as a project at UCSB (University of Santa Barbara, California); since then, AppScale Systems, Inc. has amassed thousands of users. The primary data store that utilized for their “Data Store API” is Apache Cassandra.

Data Store API

When AppScale originally launched, there were 12 interchangeable “plug and play” data stores to choose from when using the data store API; these data stores included: Cassandra, MongoDB, HBase, Redis, MySQL, PostgreSQL and the like. The data store API allows users to connect a data store to any application built on the open source app engine, whether it’s written in Java, Python, Go, PHP, etc. or deployed on EC2, CGE, Digital Ocean, or local hardware. AppScale’s mission is to create portability for Google AppEngine applications.

appscale-screenshot

After further investigation of how application developers were utilizing the data store API, they found that most users of the API just wanted it to work and didn’t care too much about lower-level details, such as which data store is being used on the backend; this provoked AppScale to eliminate complexity in their API by narrowing down their data store options and choosing one default store.

Being as AppScale Systems, Inc. was primarily maintaining the data stores, they found Cassandra to be the easiest option to to get going and scale out; for example, when comparing HBase to Cassandra, it did well up until a certain size but then slowed down. Additionally, Cassandra’s fault tolerance and similarities to Google’s big table held it up in a positive light. Ultimately, AppScale Systems, Inc. chose Apache Cassandra as their main data store implementation for their data store API.

Cassandra Community

One of the primary (and most original) components of the AppScale System is Apache Cassandra; they’ve been using Cassandra since 2009, and were there for all of the early-on bugs/improvements while it was just starting out as a project coming out of Facebook.

As Raj says, “When choosing and implementing any open source software as a main component of your project, one of the most important aspects is the vibrancy and helpfulness of its accompanying community.”

One of the aspects of Apache Cassandra that really drew AppScale in, was their ability to ask questions and quickly receive help; going into an IRC room for a great number of open source projects doesn’t typically pan out, and many times you don’t receive a reply, but in Cassandra’s IRC you can almost guarantee an instant reply to your question.

Dan Cundiff — Lead Engineering Consultant at Target
"Apache Cassandra is a fun technology to work with and it makes possible things which were never easily done before."
Dan Cundiff — Lead Engineering Consultant at Target

Cassandra Summit 2014 was a fantastic event that gave us an opportunity to meet other companies and hear stories, use cases, and technical tutorials about Apache Cassandra. My team and I also came to the event to share a story of our own; I work on the team that makes api.target.com.

The Target API

We build APIs that are internally and externally exposed for a wide array of clients: From 3rd parties who rely on our data to the Target branded applications on iPhone and Android.

Making that possible isn’t easy; a majority of the data captured through our API comes from systems that weren’t easy to scale or never meant to handle the volumes of data being produced. After extensively evaluating our options, we found Apache Cassandra to be the best fit for a majority of the problems we needed to solve. That decision was the start of our journey in bringing Apache Cassandra to Target.

Our Presentation

The slides and video from my talk at Cassandra Summit 2014 tell our story:
• The problem we had
• Barriers with existing solutions
• Why trying Apache Cassandra was attractive
• Barriers to adoption
• Challenges integrating with existing systems
• Challenges of standing up
• Challenges of developing against Apache Cassandra
• Operational challenges
• What we ended up open sourcing
• Lessons learned with Apache Cassandra
• The results and what the future holds for Apache Cassandra at Target

Conclusion

Every story told by the companies who spoke at Cassandra Summit 2014 were completely different from the next, but there was one common theme across them all: Apache Cassandra is a fun technology to work with and it makes possible things which were never easily done before.

Lastly, if what we’re doing sounds fun, we’re hiring — talk to us! Our journey with Apache Cassandra has just begun.

Be sure to check out all of the videos from Cassandra Summit 2014, on the official Cassandra Summit 2014 YouTube playlist. Also, check out the agenda and register for Cassandra Summit Europe 2014, taking place on the 3rd and 4th of December in London, UK.

Joost van de Wijgerd CTO and Co-founder at BUX
"Because Cassandra is so fast in writing, we can afford to serialize the whole state every time a message is received. Currently we are already doing thousands of writes per second at peak and we are growing twenty to thirty percent every week."
Joost van de Wijgerd CTO and Co-founder at BUX

​At BUX, we are building a mobile only stock market trading app that aims to make it easy and fun for (non-experienced) users to start playing the markets. Users start off with our virtual currency funBUX and can build a portfolio in selected stocks, indices, commodities and currency from the U.S. and Europe. If a user wants to, he or she can upgrade their account to real money (seriousBUX).

All products are traded in real time, with actual market data. For the seriousBUX part of the product we integrate with brokers that provide the actual execution of the trades on the market. On top of the trading activity we have built social features such as an activity feed, News and Battles with friends (i.e. who is the best investor in a certain timeframe)

Currently we are live with an iOS app in The Netherlands and the U.K. Android app is set to be launched in the beginning of 2015 and we are also going to expand to other European countries.  ​

 I am the CTO and a Co-Founder of the company and I currently have 3 mobile and 2 backend developers in my team. 

Ease of use + fast writes

​I started working with Cassandra in 2010 (version 0.6) at my previous company. Back then we wanted to have a JVM based solution because we knew how to operate Java at scale (we had 500 servers at that time, most of them running Java). We also looked at HBase (as we already has a Hadoop cluster running at that time) but were put off by the amount of moving parts. What made the decision for Cassandra was the fact that it was a single process that was easy to cluster over multiple machines.

At BUX we knew we needed to store a lot of state updates / overwrites in a short amount of time. As the Cassandra writes is on the direct execution path it has to be really fast. Also Cassandra is very easy to operate and is really stable.​

Cassandra with ElasticActors

​We use Cassandra as part of the ElasticActors framework, which is a Persistent Stateful Actor Framework written in Java of which I am the author. In this message passing framework each actor has it’s own state object that is serialized (using Jackson and lz4 compression) into a byte array which is subsequently stored in Cassandra.

The framework uses sharding to scale up. This means that an ElasticActors cluster needs to be initialized with a number of shards (say 1024) and Actors are mapped to a shard, while shards are mapped to nodes using consistent hashing. Within Cassandra each shards has it’s own row, and each actor is mapped to a column. 

Because Cassandra is so fast in writing, we can afford to serialize the whole state every time a message is received. Currently we are already doing thousands of writes per second at peak and we are growing twenty to thirty percent every week. I expect this setup to easily scale to fifty to a hundred thousand writes per second.

The interesting thing here is that we do a lot of overwrites. Cassandra’s architecture with the Memtables and SSTables is very suitable for this as the Memtable will absorb a lot of the overwrites and also after a scale out the state of recently active actors is still in the memtable which leads to very fast reads as well. One caveat here is that the HeapAllocator needs to be used (memtable_allocator: HeapAllocator in cassandra.yaml).

Cassandra at BUX

​We are running Cassandra version 1.2.13 on Debian Linux with Oracle JRE 1.7.0_60. We have 3 nodes in a single datacenter and have a replication factor of 3 and we read and write with Quorum/Quorum. 

The hardware is 2x480GB SSD mirrored, on a 12 core machine with 48GB RAM. Tip: when on this hardware, don’t forget to set the scheduler to noop! (http://stackoverflow.com/questions/1009577/selecting-a-linux-i-o-scheduler​)

​There are have 2 ColumnFamilies, one that stores all the actor state and another one that stores the scheduled messages. I know Cassandra should not be used as a persistent queue, however I didn’t want to add yet another moving part to the setup. Since there are a lot of deletions I have set gc_grace_seconds to 3600 so tombstones are collected every hour.​ 

​Cassandra gives us a very reliable persistency layer for our application. We store all the current state of our app in there and when we restart or when there’s a scale out or failover situation we rely on Cassandra to quickly serve up the state to our application clusters. ​

Community & advice

​The community is always very good help, I mainly needed support in 2010 and 2011 when we were scaling up. Together with my colleagues at eBuddy (my previous company) we even submitted some bugfixes to the the Cassandra code. I also lobbied to get the HeapCollector back in after it was taken out in a certain 1.2 version (if I recall correctly). Support from the core committers was always great. ​ 

From a development perspective, try setting up a cluster on your development machine (for instance using vagrant and virtualbox) to experience how easy it is. ​As for data model design, try to stay away from the relational model and store your data as you want to use it. 

With CQL I think there is a risk that developers start viewing Cassandra as a relational database and start making the wrong decisions in their data model design. Having said that, I really like Cassandra and I advise everybody dealing with Big Data to take a look at it. Also, it can do way more than the use case that I am using it for now, so check it out!​

 

 

1 2 3 142