December 13th, 2013

DataStax is proud to be partnering with Google on the recently announced general availability of Google Compute Engine.

We briefly described our test of DataStax Enterprise in a guest post on the Google Cloud Platform Blog. Here are more details on the test we ran.

We tested three scenarios, all with positive outcomes:

  1. Operational stability of 100 nodes spread across two physical zones
  2. Reconnecting persistent disks to an instance after rebooting/failure
  3. Disk performance under load

Summary of Findings

100 node dual zone scenario

After installing DSE 3.2, common administrative activities were performed on a 100 node cluster split across data centers (separate GCE zones). We installed DSE OpsCenter 4.0 to observe and monitor our cluster. Following the shakeout we let the cluster continue loading data at a moderate rate [1] to conduct a 72 hour longevity test. This helped us establish operational confidence and verify trouble-free operation.

This was a configuration to exercise common system administrator activities like building up a cluster, loading it with data, adding a second zone, etc.
We did not intend to quantify performance with the 100 node configuration. Rather, we wanted to qualitatively evaluate the functionality under every-day conditions.

This configuration, with a continuous, repeating workload served as the basis for our 72 hour longevity test, designed to verify the platform provides trouble-free operation. We chose a single client worker to generate a transaction arrival rate that created enough load to utilize, but not tax the system. [1]

Our experience was trouble-free and DSE “just worked”. Our 72-hour longevity test completed without issue. Dual zones effortlessly streamed data to accomplish
replication. OpsCenter monitored the whole operation with green indicators across the board.

Persistent Disk Scenario

One of the advantages of the GCE platform is it’s use of persistent disks. When an instance is terminated the data is still persisted and can be re-connected to a new instance. This gives great flexibility to Cassandra users. For instance, you can upgrade a node to a higher CPU/Memory limit without re-replicating the data or recover from the loss of a node without having to stream all of the data from other nodes in the cluster.

We tested this by terminating a node, creating a new one, and re-connecting the disk to the new node. We did this within the hinted handoff window. When the new node re-joined the ring it was operational without having to run a repair. Since the new node had a different IP address we did need to adjust some configuration and remove the old node from the cluster, but the data did not need to be streamed.

Disk Performance Under Load Scenario

For this test we created a three node cluster and generated load that approached the limit of the disk throughput . During the test we captured CF Histogram data as well as low level disk latency data. We were very interested to see how consistent the latency would be on Persistent Disks. Our tests showed good distribution of latency during the tests as long as our load did not exceed the throughput threshold (which varies in GCE by the size of your disk). 90% of the times were less than 8ms. The key to consistent latency in GCE will be sizing your cluster so that each node stays within the throughput limits.

Test Details

100 node dual zone tests

Persistent Disk and GCE Instance settings

  • 100 instances split between two Zones: us-central2-a and us-central2-b
  • Machine type: n1-standard-2 (2 vCPU, 7.5 GB memory)
  • O/S image: debian-7-wheezy-v20131014
  • Persistent disk: 2000Gb

DSE / Cassandra settings

  • 2 data centers (50 nodes each)
  • 256 Vnodes
  • GossipingPropertyFileSnitch

In this scenario we used cassandra-stress to generate 50 million unique records and insert those into the 100 node cluster. Our cluster is configured with two data centers (each in a separate GCE zone) so the column families, each configured with RF=3, will get replicatedacross the two data centers.

./bin/nodetool status | grep UN | awk '{print $2}' > /tmp/nodelist

./resources/cassandra/tools/bin/cassandra-stress -K 100 -t 50
-R org.apache.cassandra.locator.NetworkTopologyStrategy
--num-keys=50000000 --columns=20 -D /tmp/nodelist -O DC1:3,DC2:3

CF Histogram output after stress-write

Graph generated from the output of:


./bin/nodetool cfhistogram “Keyspace1” “Standard1”



3 node, 1 test client stress write tests

Persistent Disk and Instance settings

  • 3 nodes
  • Machine type: n1-standard-8 (8 vCPU, 30 GB memory)
  • O/S image: debian-7-wheezy-v20131014
  • Persistent disk: 3000Gb

DSE / Cassandra settings

  • 256 Vnodes
  • DseSimpleSnitch
  • concurrent_writes: 64 (2x default)
  • commitlog_total_space_in_mb: 4096 (4x default)
  • memtable_flush_writers: 4 (4x default)
  • trickle_fsync: true

In this scenario we use one node and a replication factor of three to generate large quantities of write I/O. We generated 10 million records simultaneously on one node using the cassandra-stress utility that comes with DataStax Enterprise. The test client was running from one of the same nodes that were also serving DSE. This probably limits the absolute performance but was convenient and served well enough to fully load the Persistent Disks.

In this test there is only one data center and we are using RF=3 on the Cassandra Keyspace. This approach will put a copy of each row of data on each of the three nodes.

[Run on only one node]
./bin/nodetool status | grep UN | awk '{print $2}' > /tmp/nodelist
./resources/cassandra/tools/bin/cassandra-stress -D /tmp/nodelist --replication-factor 3 --consistency-level quorum --num-keys 10000000 -K 100 -t 100 --columns=50 --operation=INSERT

This graph shows write operations per second, the number of writes per second completed by the ring.


This graph shows median latency, a figure of merit indicating how much time it takes to satisfy a write request (in milliseconds).


CF Histogram output after stress-write

First a graph generated from the output of:

./bin/nodetool cfhistograms “Keyspace1″ “Standard1″


Second, a graph generated from the Google PD backend:




1. For those into details we’ve run the client at a 60 inserts/sec/node rate (6,000 records per second for the cluster).