January 29th, 2014

By 

flitemechanics

 

The in Operator in Cassandra CQL blog posting was created by Ike Walker. To view more postings by Ike, check out the Mechanics of Flite blog here.

 

In my post about the CQL WHERE clause I didn’t fully cover when the IN operator is supported and how it differs from the = operator. Based on the comments on that post and some questions I’ve gotten from people who read the post, I think it’s worth going into some detail about the IN operator.

 

Here’s where the IN operator is supported:

  1. The last column in the partition key, assuming the = operator is used on the first N-1 columns of the partition key

  2. The last clustering column, assuming the = operator is used on the first N-1 clustering columns and all partition keys are restricted

  3. The last clustering column, assuming the = operator is used on the first N-1 clustering columns and ALLOW FILTERING is specified

It’s worth mentioning that I can use the IN operator interchangably with the = operator on a single value, for example:

is equivalent to:

But in that case the semantics are that of = even though I’m using IN.

The real use case of IN is with multiple distinct values. The simple case involves a single column partition key and/or clustering column. For this I’ll use the temperature table I’ve used in past examples:

I can use IN on the partition key:

  

Or on the clustering column (with ALLOW FILTERING):

In general it’s best to avoid queries that require ALLOW FILTERING because they often require lots of data to be scanned even if only a small amount of data is returned, but I show that example because it is a supported use of the IN operator.

A more feasible example is to use both together (without ALLOW FILTERING):

If I try to use the IN operator on the temperature column I get an error:

Even if I add a secondary index:

Moving on to composite keys, I’ll use the table that Vasyl Boroviak used in the comments of my previous post:

This table is useful because it contains a composite partition key and and has multiple clustering columns.

Since the partition key contains two columns, I need to use = on the first column in order to use IN on the second column:

Likewise there are two clustering columns, so I need to use = on the first column in order to use IN on the second column, and I also need to use ALLOW FILTERING if this is the only criteria:

Again I can combine the those two queries together without ALLOW FILTERING:

LinkedIn