Apache Cassandra Lunch #51: Cassandra Cluster Design & Architecture

6/16/2022

Reading time:5

Apache Cassandra Lunch #51: Cassandra Cluster Design & Architecture - Business Platform Team

This resource is based on an article originally published here.

In Apache Cassandra Lunch #51: Cassandra Cluster Design & Architecture, we will discuss an overview of Cassandra cluster architecture, not to be confused with the Cassandra database architecture. Specifically, using Cassandra datacenters to isolate workloads. The live recording of Cassandra Lunch, which includes a more in-depth discussion and a demo, is embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register here now!

Apache Cassandra

Apache Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

Use Cases

Apache Cassandra is best used in situations in which fast reads and writes of terabytes of data are required. Cassandra is also great in situations in which replication and availability of data are a global need. Additionally, if the data in question should never have downtime and be constantly available. Cassandra’s multi-node and datacenter distribution and replication allow for all of these scenarios. Cassandra is meant for BIG data.

Cassandra should not be used if the amount of data is stored in gigabytes, can be comfortably housed in one data center, or if the system can allow for downtime. Especially don’t use Cassandra if you are just trying to use the latest trend in database technology. In these use cases, another type of relational database is likely a better option than Cassandra.

Cassandra Data Model

The Cassandra data model of tables and column families may look similar to SQL Server, MySQL, PostgreSQL tables, and databases. They are not. The data model consists of keyspaces, similar to databases, column families, similar to tables in the relational model, keys and columns. The Cassandra Query Language (CQL) supports queries with primary and optional clustering keys. CQL does not support arbitrary queries of columns, table joins are not allowed. Also, Cassandra should not be managing more than 100 to 150 tables across any number of key spaces.

Cassandra Cluster Architecture

Physical vs. Logical Datacenters

Cassandra clusters provide flexibility when it comes to architecture of distributing workloads. This is due in part to the ability to have both physical and logical data centers. Meaning, clusters can be physically or virtually distributed.

Physical Datacenters

Physical datacenters can be physical locations or separate cloud-based datacenters. In a physical data center, racks are used to define availability zones. Racks will contain nodes, with the nodes containing the data. Physical data centers still allow for high availability and redundancy with replication factors set by the keyspace.

Logical

Similar to physical data centers, racks contain the nodes which also contain the data, and the replication factor is still defined by the keyspace. The difference between the two is that in a logical data center the machines are located in the same place.

Availability / Performance of Data

In a single data center cluster, data is replicated as defined by the keyspace. Data is managed by the replication factors, QUORUM, ONE, or ALL. Repair processes are synced across all nodes in a data center. In a multi-datacenter cluster, data replication is still defined by the keyspace, but there are additional options for setting the replication factors. In a multi-datacenter cluster, the options include those of the single datacenter with the addition of LOCAL_QUORUM, LOCAL_ONE, or LOCAL_ALL. The local option defines whether or not the data will be replicated to the other datacenters or will be limited to the datacenter which is handling the transaction. Repair processes in a multi-datacenter cluster will sync across all datacenters.

Availability / Redundancy of Data

Data in a single data center cluster, the full dataset for keyspaces and tables are distributed among all the nodes. Racks will help distribute data evenly across partitions. Additionally, racks can be put in availability zones in the cloud or physical racks. Data in a multi-datacenter cluster, the full dataset for keyspaces and tables are distributed among the different nodes in each datacenter. Racks will still help distribute data evenly throughout a data center. Racks can still be put in availability zones in the cloud or physical racks. Datacenters can be located in physically different locations and the data centers can be used to isolate workloads.

Distributing Workloads

Logical

This image demonstrates a logical multi-datacenter workload distribution using a Cassandra cluster design and architecture with isolation between data transactions in one data center. That same data is replicated to a separate, virtual data center performing analysis of the data. While yet another virtual data center is reporting on the data that is being replicated. This means that the workload of one data center will not be impacting the performance of the other two data centers. This workload distribution could also be accomplished using separate physical datacenters.

Cassandra cluster design and architecture diagram of a logical multi-datacenter workload distribution in Kubernetes.

In this image of a logical multi-datacenter workload distribution in Kubernetes, the same workload distribution is happening as above, but Kubernetes containers are being used instead of virtual machines.

Physical

This is an example of using a physical/hybrid multi-datacenter cloud distribution. Cloud-separated data centers are syncing with an on-premise data center. An example would be using lightweight workloads on the cloud data centers while using the on-premise data center to handle a heavier analytics workload.

Diagram of a physical multi-datacenter cluster distribution.

In this physical multi-datacenter distribution, theoretically, this distribution could be taken to an interplanetary scale with data centers on their respective physical locations.

Resources

https://cassandra.apache.org/

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra, but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!

Posted in Modern Business | Comments Off on Apache Cassandra Lunch #51: Cassandra Cluster Design & Architecture

Related Articles

cassandra

artificial.intelligence

machine.learning

Architect’s Guide to Using NoSQL for Real-time AI: Part 1

6/9/2023

cassandra

nosql

Simple Way to Install Cassandra In Windows 10

1/12/2023

cassandra

architecture

video

Ch5 Session04 - Apache Cassandra Architecture

11/4/2022

digital.twin

cassandra

uml

Cassandra Lunch #103 - Architecture of Cassandra Data Processing - Business Platform Team

7/21/2022

stargate

cassandra.lunch

cassandra

Apache Cassandra Lunch #87: Cassandra.api, Astra, and Stargate - Business Platform Team

7/8/2022

cqlsh

cassandra.lunch

cassandra

Apache Cassandra Lunch #77: Connect to DataStax Astra via Standalone CQLSH - Business Platform Team

7/2/2022

datastax

cassandra.basics

cassandra.lunch

Cassandra Lunch #75: Getting Started with DataStax Enterprise (DSE) on Docker - Business Platform Team

6/29/2022

cassandra.basics

cassandra.lunch

cassandra

Cassandra Lunch #70: Basics of Apache Cassandra - Business Platform Team

6/27/2022

sparksql

cassandra.lunch

cassandra

Apache Cassandra Lunch #65: Spark Cassandra Connector Pushdown - Business Platform Team

6/27/2022

datastax

cassandra.lunch

cassandra

Apache Cassandra Lunch #68: DataStax Apache Kafka Connector - Business Platform Team

6/25/2022

Explore Further

cassandra.lunch

stargate

cassandra.lunch

cassandra

Apache Cassandra Lunch #87: Cassandra.api, Astra, and Stargate - Business Platform Team

7/8/2022

cqlsh

cassandra.lunch

cassandra

Apache Cassandra Lunch #77: Connect to DataStax Astra via Standalone CQLSH - Business Platform Team

7/2/2022

datastax

cassandra.basics

cassandra.lunch

Cassandra Lunch #75: Getting Started with DataStax Enterprise (DSE) on Docker - Business Platform Team

6/29/2022

cassandra.basics

cassandra.lunch

cassandra

Cassandra Lunch #70: Basics of Apache Cassandra - Business Platform Team

6/27/2022

cassandra

acid

open.source

cassandra

GitHub - pmcfadin/awesome-accord: Repository of all kinds of things to help you get up and running with ACID transactions on Apache Cassandra®

1/16/2025

mongo

nocode

elasticsearch

GitHub - ibagroup-eu/Visual-Flow: Visual-Flow main repository

12/2/2024

mongo

nocode

elasticsearch

GitHub - ibagroup-eu/Visual-Flow: Visual-Flow main repository

12/2/2024

migration

proxy

cassandra

GitHub - datastax/cql-proxy: A client-side CQL proxy/sidecar.

11/1/2024

architecture

cassandra

architecture

video

Ch5 Session04 - Apache Cassandra Architecture

11/4/2022

digital.twin

cassandra

uml

Cassandra Lunch #103 - Architecture of Cassandra Data Processing - Business Platform Team

7/21/2022

sstable

cassandra

spark

Spark and Cassandra’s SSTable loader

11/1/2024

analytics

cassandra

spark

GitHub - apache/cassandra-analytics: Apache cassandra

9/4/2024

Apache Cassandra

Use Cases

Cassandra Data Model

Cassandra Cluster Architecture

Physical vs. Logical Datacenters

Physical Datacenters

Logical

Availability / Performance of Data

Availability / Redundancy of Data

Distributing Workloads

Logical

Physical

Resources

Cassandra.Link

Become part of our

growing community!

Planet Cassandra is a service for the Apache Cassandra® user community to share with each other. From tutorials and guides, to discussions and updates, we're here to help you get the most out of Cassandra. Connect with us and become part of our growing community today.

Get Involved with Planet Cassandra!

We believe that the power of the Planet Cassandra community lies in the contributions of its members. Do you have content, articles, videos, or use cases you want to share with the world?