February 11th, 2013

 Elapsed Time: 20:30 | Listen to this Podcast


System Metadata Stored in Key-Spaces

Will this feature be applied to drivers or is it just there for developers to query and get access to, on a per application basis?

Drivers will definitely be taking advantage of this feature.  It has always been a pain to get this information out of Cassandra and this feature allows you to come over the same protocol, as anything else that would be in a column family and creates easy consumption. Additionally, 3rd party Integration tools could take advantage of this new feature to present drop-downs or UIs to the user that say “here are the column families that in the system, which one do you want to look at or pull into this visualization tool or this analytics tool?”


Thrift & Collections Support

Are there Python, Java, etc. libraries with support for 1.2 collections? 

Kind of; collections are just a feature of CQL, so anything that speaks CQL can use collections. Unfortunately, not a lot of things support CQL that well at the moment but DataStax is creating a Java driver that makes heavy use of CQL.


Matt’s Recommendation: “I’d like to see this as more of a programatic API level feature, instead of it just being in CQL. But that’s just my preference in preferring programatic APIs to text based APIs. Like CQL, where I have to join text together to do my queries or do my writes”


Are there different kinds of collections?

Yes, the standard collection types are Sets, Lists, Maps.

• Sets: No ordering or duplicates

• Lists: Order and allows duplicates

• Maps: Map a key to a value



What problems do Vnodes address?

Historically, it was best to double the size of your Cassandra cluster when expanding. If you originally had 4 nodes, visualize them on a clock in equal positions: 12, 3, 6, & 9. By doubling your cluster, these new nodes would also appear in equal positions: 1:30, 4:30, 7:30, 10:30.  By doubling, you’ve prevented the original nodes from having to fully move; the new nodes just take some of the load from the existing nodes directly next to them. Here lies the problem: if you have 20 nodes and want to add just 1 node, you’ll need to move all 20 nodes in sequence, one after the other, so that they are equally spread. This takes time. In addition, Cassandra historically had an issue where if you let a cluster redline, when all of your IO is being used, and then you tried adding/removing nodes, it made things worse before they became better.


Vnodes solves the problem stated above by, when adding or removing a node from the cluster, instead of only being able to only talk to the nodes that are near you, you can talk to all of the nodes in the cluster and can even pull a small amount of information from each of them.  This avoids an overload on the cluster when you’re adding/removing nodes; with Vnodes, the load can be spread across the entire cluster and save you time.


*Listen to the entire podcast for an in-depth explanation on how Vnodes solves the problems stated above

How far back can one upgrade from, when upgrading to 1.2 and am I required to use Vnodes? 

You can always stepwise upgrade from the previous versions to the current version. In 1.2 you aren’t required to utilize Vnodes, as it is turned off by default; this makes it easy for those upgrading and looking to take advantage of other 1.2 features without messing with Vnodes.


Any best practices with Vnodes, when going from a pre-1.2 implementation to 1.2?

Matt’s Recommendation: “At least right now, until Vnodes are tested a bit more, it’s best to start them on a new cluster then it is to upgrade an existing cluster to Vnodes. However, there is documentation (http://www.datastax.com/docs/1.2/install/upgrading) to do that but I recommend running it in a test environment before considering full production. It’s supposed to work but it’s rather new and it touches many parts of Cassandra, increasing the risk. Be cautious in your production systems.”