Illustration Image

6/9/2022

Reading time:1

Apache Cassandra Lunch #42: SSTable Files with SSTableloader - Business Platform Team

logo

This resource is based on an article originally published here.

In case you missed it, this blog post is a recap of Cassandra Lunch #42, covering SSTable files. It also covers their relation to SSTableLoader. We also walk through an example using SSTableloader to load data taken from a cluster to a new, empty cluster. The live recording of Cassandra Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register here now!

SSTable Files

An individual SSTable is a section of on-disk storage used in Cassandra. It is also used in a number of other NoSQL databases. SSTables take the form of directories and files containing the data. They also hold other useful information to facilitate reading that data later on. SSTables are immutable once written, with new ones being added over time. More details on SSTables can be found in our previous posts here and here.

SSTableloader

SSTableloader, also known as the Cassandra Bulk Loader is a tool for loading data from SSTables into a Cassandra cluster. Note that this is different from loading SSTables onto a Cassandra cluster. Rather than copying SSTable files, sstableloader instead streams the data contained in those files onto a Cassandra cluster. This process respects things like replication strategy and replication factor for clusters and keyspaces being loaded. 

In order to work properly, the sstableloader must be given a directory containing at least the Index.db and Data.db sections of the full SSTable directory. It also works off of snapshots. The keyspace and table for data to be streamed into must already exist, but the table can already have other data in it.

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra, but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!

Related Articles

Placeholder
cassandra
tools
sstables

Explore Further

cassandra.lunch

cassandra

sstables

Become part of our
growing community!
Welcome to Planet Cassandra, a community for Apache Cassandra®! We're a passionate and dedicated group of users, developers, and enthusiasts who are working together to make Cassandra the best it can be. Whether you're just getting started with Cassandra or you're an experienced user, there's a place for you in our community.
A dinosaur
Planet Cassandra is a service for the Apache Cassandra® user community to share with each other. From tutorials and guides, to discussions and updates, we're here to help you get the most out of Cassandra. Connect with us and become part of our growing community today.
© 2009-2023 The Apache Software Foundation under the terms of the Apache License 2.0. Apache, the Apache feather logo, Apache Cassandra, Cassandra, and the Cassandra logo, are either registered trademarks or trademarks of The Apache Software Foundation. Sponsored by Anant Corporation and Datastax, and Developed by Anant Corporation.

Get Involved with Planet Cassandra!

We believe that the power of the Planet Cassandra community lies in the contributions of its members. Do you have content, articles, videos, or use cases you want to share with the world?