ic-tools for Apache Cassandra SSTables
Table of Contents
Overview
Instaclustr has developed a number of useful tools to assist with diagnosing issues in a cluster. For users of Instaclustr’s Managed Service, our Technical Operations team will run these as needed when working with you to help diagnose issues. The tools are available on a supported basis for our enterprise support customers and on an unsupported basis for the general community (although we’ll probably answer questions on the C* user email list).
These tools supplement the information available from the nodetool utility that is part of core Apache Cassandra. Whereas nodetool tends to report based on summary statistics maintained as Cassandra services operate, ic-tools directly read Cassandra’s data files when executed to report more detailed and accurate statistics.
As such, executing the tools can result in a large amount of data being read which can potentially impact the performance of a node where they are being executed. The two most data heavy tools (ic-cfstats and ic-purge) provide rate limiting functions to reduce the impact. However, users are advised to execute care when using these tools in a live cluster.
These tools are version-specific and you must use the corresponding ic-tools version for your Cassandra version. We have provided pre-built jars for all versions of Cassandra at the bottom of this page.
The source code is published on GitHub.
Command
Description
ic-summary
Summary information about all column families including how much of the data is repaired
ic-sstables
Print out metadata for SSTables the belong to a column family
ic-pstats
Partition size statistics for a column family
ic-cfstats
Detailed statistics about cells in a column family
ic-purge
Statistics about reclaimable data for a column family
(We’ve generally used the old-school C* term ‘column family’. It is synonymous with ‘table’ in modern C* versions.)
ic-summary
Provides summary information about all column families. Useful for finding the largest column families and how much data has been repaired by incremental repairs.
Usage
ic-summary
Output
Column
Description
Keyspace
Keyspace the column family belongs to
Column Family
Name of column family
SSTables
Number of SSTables on this node for the column family
Disk Size
Compressed size on disk for this node
Data Size
Uncompressed size of the data for this node
Last Repaired
Time of the last incremental repair
Repair %
Percentage of data marked as repaired by incremental repair
ic-sstables
Print out SSTable metadata for a column family. Useful in helping to tune compaction settings.
Usage
ic-sstables <keyspace> <column-family>
Output
Column
Description
SSTable
Data.db filename of SSTable
Disk Size
Size of SSTable on disk
Total Size
Uncompressed size of data contained in the SSTable
Min Timestamp
Minimum cell timestamp contained in the SSTable
Max Timestamp
Maximum cell timestamp contained in the SSTable
Duration
The time span between minimum and maximum cell timestamps
Level
Leveled Tiered Compaction SSTable level
Keys
Number of partition keys
Avg Partition Size
Average partition size
Max Partition Size
Maximum partition size
Avg Column Count
Average number of columns in a partition
Max Column Count
Maximum number of columns in a partition
Droppable
Estimated droppable tombstones
Repaired At
Time when marked as repaired by incremental repair
ic-pstats
Tool for finding largest partitions. Reads the Index.db files so is relatively quick.
Usage
ic-pstats [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>
-h
Display help
-b
Batch mode. Uses progress indicator that is friendly for running in batch jobs.
-n <num>
Number of partitions to display in leaders lists
-t <name>
Snapshot to analyse (snapshot name from nodetool listsnapshots). Snapshot is created if none is specified.
-f <files>
Comma separated list of Data.db SSTables to filter on
Output
Summary: Summary statistics about partitions
Column
Description
Count (Size)
Number of partition keys on this node
Total (Size)
Total uncompressed size of all partitions on this node
Total (SSTable)
Number of SSTables on this node
Minimum (Size)
Minimum uncompressed partition size
Minimum (SSTable)
Minimum number of SSTables a partition belongs to
Average (Size)
Average (mean) uncompressed partition size
Average (SSTable)
Average (mean) number of SSTables a partition belongs to
Std dev. (Size)
Standard deviation of partition sizes
Std dev. (SSTable)
Standard deviation of number of SSTables for a partition
50% (Size)
Estimated 50th percentile of partition sizes
50% (SSTable)
Estimated 50th percentile of SSTables for a partition
75% (Size)
Estimated 75th percentile of partition sizes
75% (SSTable)
Estimated 75th percentile of SSTables for a partition
90% (Size)
Estimated 90th percentile of partition sizes
90% (SSTable)
Estimated 90th percentile of SSTables for a partition
95% (Size)
Estimated 95th percentile of partition sizes
95% (SSTable)
Estimated 95th percentile of SSTables for a partition
99% (Size)
Estimated 99th percentile of partition sizes
99% (SSTable)
Estimated 99th percentile of SSTables for a partition
99.9% (Size)
Estimated 99.9th percentile of partition sizes
99.9% (SSTable)
Estimated 99.9th percentile of SSTables for a partition
Maximum (Size)
Maximum uncompressed partition size
Maximum (SSTable)
Maximum number of SSTables a partition belongs to
Largest partitions: The top N largest partitions
Column
Description
Key
The partition key
Size
Total uncompressed size of the partition
SSTable Count
Number of SSTables that contain the partition
SSTable Leaders: The top N partitions that belong to the most SSTables
Column
Description
Key
The partition key
SSTable Count
Number of SSTables that contain the partition
Size
Total uncompressed size of the partition
SSTables: Metadata about SSTables as it relates to partitions.
Column
Description
SSTable
Data.db filename of SSTable
Size
Uncompressed size
Min Timestamp
Minimum cell timestamp in the SSTable
Max Timestamp
Maximum cell timestamp in the SSTable
Level
Leveled Tiered Compaction level of SSTable
Partitions
Number of partition keys in the SSTable
Avg Partition Size
Average uncompressed partition size in SSTable
Max Partition Size
Maximum uncompressed partition size in SSTable
ic-cfstats
Tool for getting detailed cell statistics that can help identify issues with data model.
Usage
ic-cfstats [-r <limit>] [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>
-h
Display help
-b
Batch mode. Uses progress indicator that is friendly for running in batch jobs.
-r <limit>
Limit read throughput to ratelimit MB/s (unlimited by default, 16 is probably a good starting point if you want to limit)
-n <num>
Number of partitions to display in leaders lists
-t <name>
Snapshot to analyse (snapshot name from nodetool listsnapshots). Snapshot is created if none is specified.
-f <files>
Comma separated list of Data.db SSTables to filter on
Output
Summary: Summary statistics about partitions
Column
Description
Count (Size)
Number of partition keys on this node
Rows (Size)
(3.x only) Number of clustering rows
(deleted)
(3.x only) Number of clustering row deletions
Total (Size)
Total uncompressed size of all partitions on this node
Total (SSTable)
Number of SSTables on this node
Minimum (Size)
Minimum uncompressed partition size
Minimum (SSTable)
Minimum number of SSTables a partition belongs to
Average (Size)
Average (mean) uncompressed partition size
Average (SSTable)
Average (mean) number of SSTables a partition belongs to
Std dev. (Size)
Standard deviation of partition sizes
Std dev. (SSTable)
Standard deviation of number of SSTables for a partition
50% (Size)
Estimated 50th percentile of partition sizes
50% (SSTable)
Estimated 50th percentile of SSTables for a partition
75% (Size)
Estimated 75th percentile of partition sizes
75% (SSTable)
Estimated 75th percentile of SSTables for a partition
90% (Size)
Estimated 90th percentile of partition sizes
90% (SSTable)
Estimated 90th percentile of SSTables for a partition
95% (Size)
Estimated 95th percentile of partition sizes
95% (SSTable)
Estimated 95th percentile of SSTables for a partition
99% (Size)
Estimated 99th percentile of partition sizes
99% (SSTable)
Estimated 99th percentile of SSTables for a partition
99.9% (Size)
Estimated 99.9th percentile of partition sizes
99.9% (SSTable)
Estimated 99.9th percentile of SSTables for a partition
Maximum (Size)
Maximum uncompressed partition size
Maximum (SSTable)
Maximum number of SSTables a partition belongs to
(3.x only) Row Histogram: Histogram of number of rows per partition
Column
Description
Percentile
Minimum, average, standard deviation (std dev.), percentile, maximum
Count
Estimated number of rows per partition for the given percentile
Largest partitions: Partitions with largest uncompressed size
Column
Description
Key
The partition key
Size
Total uncompressed size of the partition
Rows
(3.x only) Total number of clustering rows in the partition
(deleted)
(3.x only) Number of row deletions in the partition
Tombstones
Number of cell or range tombstones
(droppable)
Number of tombstones that can be dropped as per gc_grace_seconds
Cells
Number of cells in the partition
SSTable Count
Number of SSTables that contain the partition
Widest partitions: Partitions with the most cells
Column
Description
Key
The partition key
Rows
(3.x only) Total number of clustering rows in the partition
(deleted)
(3.x only) Number of row deletions in the partition
Cells
Number of cells in the partition
Tombstones
Number of cell or range tombstones
(droppable)
Number of tombstones that can be dropped as per gc_grace_seconds
Size
Total uncompressed size of the partition
SSTable Count
Number of SSTables that contain the partition
(3.x only) Most Deleted Rows: Partitions with the most row deletions
Column
Description
Key
The partition key
Rows
Total number of clustering rows in the partition
(deleted)
Number of row deletions in the partition
Size
Total uncompressed size of the partition
SSTable Count
Number of SSTables that contain the partition
Tombstone Leaders: Partitions with the most tombstones
Column
Description
Key
The partition key
Tombstones
Number of cell or range tombstones
(droppable)
Number of tombstones that can be dropped as per gc_grace_seconds
Rows
(3.x only) Total number of clustering rows in the partition
Cells
Number of cells in the partition
Size
Total uncompressed size of the partition
SSTable Count
Number of SSTables that contain the partition
SSTable Leaders: Partitions that are in the most SSTables
Column
Description
Key
The partition key
SSTable Count
Number of SSTables that contain the partition
Size
Total uncompressed size of the partition
Rows
(3.x only) Total number of clustering rows in the partition
Cells
Number of cells in the partition
Tombstones
Number of cell or range tombstones
(droppable)
Number of tombstones that can be dropped as per gc_grace_seconds
SSTables: Metadata about SSTables as it relates to partitions.
Column
Description
SSTable
Data.db filename of SSTable
Size
Uncompressed size
Min Timestamp
Minimum cell timestamp in the SSTable
Max Timestamp
Maximum cell timestamp in the SSTable
Partitions
Number of partitions
(deleted)
Number of row level partition deletions
(avg size)
Average uncompressed partition size in SSTable
(max size)
Maximum uncompressed partition size in SSTable
Rows
(3.x only) Total number of clustering rows in SSTable
(deleted)
(3.x only) Number of row deletions in SSTable
Cells
Number of cells in the SSTable
Tombstones
Number of cell or range tombstones in the SSTable
(droppable)
Number of tombstones that are droppable according to gc_grace_seconds
(range)
Number of range tombstones
Cell Liveness
Percentage of live cells. Does not consider tombstones or cell updates shadowing cells. That is it is percentage of non-tombstoned cells to total number of cells.
ic-purge
Usage
ic-purge [-r <limit>] [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>
-h
Display help
-b
Batch mode. Uses progress indicator that is friendly for running in batch jobs.
-r <limit>
Limit read throughput to ratelimit MB/s (unlimited by default, 16 is probably a good starting point if you want to limit)
-n <num>
Number of partitions to display in leaders lists
-t <name>
Snapshot to analyse. Snapshot is created if none is specified.
Output
Largest reclaimable partitions: Partitions with the largest amount of reclaimable data
Column
Description
Key
The partition key
Size
Total uncompressed size of the partition
Reclaim
Reclaimable uncompressed size
Generations
SSTable generations the partition belongs to
Downloads
- ic-sstable-tools-3_11_3.jar (67 KB)
- ic-sstable-tools-3_0_17.jar (67 KB)
- ic-sstable-tools-2_2_13.jar (61 KB)
- ic-sstable-tools-2_1_20.jar (59 KB)
- ic-sstable-tools-2_0_17.jar (59 KB)
By Instaclustr Support
Table of Contents
Overview
Instaclustr has developed a number of useful tools to assist with diagnosing issues in a cluster. For users of Instaclustr’s Managed Service, our Technical Operations team will run these as needed when working with you to help diagnose issues. The tools are available on a supported basis for our enterprise support customers and on an unsupported basis for the general community (although we’ll probably answer questions on the C* user email list).
These tools supplement the information available from the nodetool utility that is part of core Apache Cassandra. Whereas nodetool tends to report based on summary statistics maintained as Cassandra services operate, ic-tools directly read Cassandra’s data files when executed to report more detailed and accurate statistics.
As such, executing the tools can result in a large amount of data being read which can potentially impact the performance of a node where they are being executed. The two most data heavy tools (ic-cfstats and ic-purge) provide rate limiting functions to reduce the impact. However, users are advised to execute care when using these tools in a live cluster.
These tools are version-specific and you must use the corresponding ic-tools version for your Cassandra version. We have provided pre-built jars for all versions of Cassandra at the bottom of this page.
The source code is published on GitHub.
Command | Description |
---|---|
ic-summary |
Summary information about all column families including how much of the data is repaired |
ic-sstables |
Print out metadata for SSTables the belong to a column family |
ic-pstats |
Partition size statistics for a column family |
ic-cfstats |
Detailed statistics about cells in a column family |
ic-purge |
Statistics about reclaimable data for a column family |
(We’ve generally used the old-school C* term ‘column family’. It is synonymous with ‘table’ in modern C* versions.)
ic-summary
Provides summary information about all column families. Useful for finding the largest column families and how much data has been repaired by incremental repairs.
Usage
ic-summary
Output
Column | Description |
---|---|
Keyspace | Keyspace the column family belongs to |
Column Family | Name of column family |
SSTables | Number of SSTables on this node for the column family |
Disk Size | Compressed size on disk for this node |
Data Size | Uncompressed size of the data for this node |
Last Repaired | Time of the last incremental repair |
Repair % | Percentage of data marked as repaired by incremental repair |
ic-sstables
Print out SSTable metadata for a column family. Useful in helping to tune compaction settings.
Usage
ic-sstables <keyspace> <column-family>
Output
Column | Description |
---|---|
SSTable | Data.db filename of SSTable |
Disk Size | Size of SSTable on disk |
Total Size | Uncompressed size of data contained in the SSTable |
Min Timestamp | Minimum cell timestamp contained in the SSTable |
Max Timestamp | Maximum cell timestamp contained in the SSTable |
Duration | The time span between minimum and maximum cell timestamps |
Level | Leveled Tiered Compaction SSTable level |
Keys | Number of partition keys |
Avg Partition Size | Average partition size |
Max Partition Size | Maximum partition size |
Avg Column Count | Average number of columns in a partition |
Max Column Count | Maximum number of columns in a partition |
Droppable | Estimated droppable tombstones |
Repaired At | Time when marked as repaired by incremental repair |
ic-pstats
Tool for finding largest partitions. Reads the Index.db files so is relatively quick.
Usage
ic-pstats [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>
-h |
Display help |
-b |
Batch mode. Uses progress indicator that is friendly for running in batch jobs. |
-n <num> |
Number of partitions to display in leaders lists |
-t <name> |
Snapshot to analyse (snapshot name from nodetool listsnapshots). Snapshot is created if none is specified. |
-f <files> |
Comma separated list of Data.db SSTables to filter on |
Output
Summary: Summary statistics about partitions
Column | Description |
---|---|
Count (Size) | Number of partition keys on this node |
Total (Size) | Total uncompressed size of all partitions on this node |
Total (SSTable) | Number of SSTables on this node |
Minimum (Size) | Minimum uncompressed partition size |
Minimum (SSTable) | Minimum number of SSTables a partition belongs to |
Average (Size) | Average (mean) uncompressed partition size |
Average (SSTable) | Average (mean) number of SSTables a partition belongs to |
Std dev. (Size) | Standard deviation of partition sizes |
Std dev. (SSTable) | Standard deviation of number of SSTables for a partition |
50% (Size) | Estimated 50th percentile of partition sizes |
50% (SSTable) | Estimated 50th percentile of SSTables for a partition |
75% (Size) | Estimated 75th percentile of partition sizes |
75% (SSTable) | Estimated 75th percentile of SSTables for a partition |
90% (Size) | Estimated 90th percentile of partition sizes |
90% (SSTable) | Estimated 90th percentile of SSTables for a partition |
95% (Size) | Estimated 95th percentile of partition sizes |
95% (SSTable) | Estimated 95th percentile of SSTables for a partition |
99% (Size) | Estimated 99th percentile of partition sizes |
99% (SSTable) | Estimated 99th percentile of SSTables for a partition |
99.9% (Size) | Estimated 99.9th percentile of partition sizes |
99.9% (SSTable) | Estimated 99.9th percentile of SSTables for a partition |
Maximum (Size) | Maximum uncompressed partition size |
Maximum (SSTable) | Maximum number of SSTables a partition belongs to |
Largest partitions: The top N largest partitions
Column | Description |
---|---|
Key | The partition key |
Size | Total uncompressed size of the partition |
SSTable Count | Number of SSTables that contain the partition |
SSTable Leaders: The top N partitions that belong to the most SSTables
Column | Description |
---|---|
Key | The partition key |
SSTable Count | Number of SSTables that contain the partition |
Size | Total uncompressed size of the partition |
SSTables: Metadata about SSTables as it relates to partitions.
Column | Description |
---|---|
SSTable | Data.db filename of SSTable |
Size | Uncompressed size |
Min Timestamp | Minimum cell timestamp in the SSTable |
Max Timestamp | Maximum cell timestamp in the SSTable |
Level | Leveled Tiered Compaction level of SSTable |
Partitions | Number of partition keys in the SSTable |
Avg Partition Size | Average uncompressed partition size in SSTable |
Max Partition Size | Maximum uncompressed partition size in SSTable |
ic-cfstats
Tool for getting detailed cell statistics that can help identify issues with data model.
Usage
ic-cfstats [-r <limit>] [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>
-h |
Display help |
-b |
Batch mode. Uses progress indicator that is friendly for running in batch jobs. |
-r <limit> |
Limit read throughput to ratelimit MB/s (unlimited by default, 16 is probably a good starting point if you want to limit) |
-n <num> |
Number of partitions to display in leaders lists |
-t <name> |
Snapshot to analyse (snapshot name from nodetool listsnapshots). Snapshot is created if none is specified. |
-f <files> |
Comma separated list of Data.db SSTables to filter on |
Output
Summary: Summary statistics about partitions
Column | Description |
---|---|
Count (Size) | Number of partition keys on this node |
Rows (Size) | (3.x only) Number of clustering rows |
(deleted) | (3.x only) Number of clustering row deletions |
Total (Size) | Total uncompressed size of all partitions on this node |
Total (SSTable) | Number of SSTables on this node |
Minimum (Size) | Minimum uncompressed partition size |
Minimum (SSTable) | Minimum number of SSTables a partition belongs to |
Average (Size) | Average (mean) uncompressed partition size |
Average (SSTable) | Average (mean) number of SSTables a partition belongs to |
Std dev. (Size) | Standard deviation of partition sizes |
Std dev. (SSTable) | Standard deviation of number of SSTables for a partition |
50% (Size) | Estimated 50th percentile of partition sizes |
50% (SSTable) | Estimated 50th percentile of SSTables for a partition |
75% (Size) | Estimated 75th percentile of partition sizes |
75% (SSTable) | Estimated 75th percentile of SSTables for a partition |
90% (Size) | Estimated 90th percentile of partition sizes |
90% (SSTable) | Estimated 90th percentile of SSTables for a partition |
95% (Size) | Estimated 95th percentile of partition sizes |
95% (SSTable) | Estimated 95th percentile of SSTables for a partition |
99% (Size) | Estimated 99th percentile of partition sizes |
99% (SSTable) | Estimated 99th percentile of SSTables for a partition |
99.9% (Size) | Estimated 99.9th percentile of partition sizes |
99.9% (SSTable) | Estimated 99.9th percentile of SSTables for a partition |
Maximum (Size) | Maximum uncompressed partition size |
Maximum (SSTable) | Maximum number of SSTables a partition belongs to |
(3.x only) Row Histogram: Histogram of number of rows per partition
Column | Description |
---|---|
Percentile | Minimum, average, standard deviation (std dev.), percentile, maximum |
Count | Estimated number of rows per partition for the given percentile |
Largest partitions: Partitions with largest uncompressed size
Column | Description |
---|---|
Key | The partition key |
Size | Total uncompressed size of the partition |
Rows | (3.x only) Total number of clustering rows in the partition |
(deleted) | (3.x only) Number of row deletions in the partition |
Tombstones | Number of cell or range tombstones |
(droppable) | Number of tombstones that can be dropped as per gc_grace_seconds |
Cells | Number of cells in the partition |
SSTable Count | Number of SSTables that contain the partition |
Widest partitions: Partitions with the most cells
Column | Description |
---|---|
Key | The partition key |
Rows | (3.x only) Total number of clustering rows in the partition |
(deleted) | (3.x only) Number of row deletions in the partition |
Cells | Number of cells in the partition |
Tombstones | Number of cell or range tombstones |
(droppable) | Number of tombstones that can be dropped as per gc_grace_seconds |
Size | Total uncompressed size of the partition |
SSTable Count | Number of SSTables that contain the partition |
(3.x only) Most Deleted Rows: Partitions with the most row deletions
Column | Description |
---|---|
Key | The partition key |
Rows | Total number of clustering rows in the partition |
(deleted) | Number of row deletions in the partition |
Size | Total uncompressed size of the partition |
SSTable Count | Number of SSTables that contain the partition |
Tombstone Leaders: Partitions with the most tombstones
Column | Description |
---|---|
Key | The partition key |
Tombstones | Number of cell or range tombstones |
(droppable) | Number of tombstones that can be dropped as per gc_grace_seconds |
Rows | (3.x only) Total number of clustering rows in the partition |
Cells | Number of cells in the partition |
Size | Total uncompressed size of the partition |
SSTable Count | Number of SSTables that contain the partition |
SSTable Leaders: Partitions that are in the most SSTables
Column | Description |
---|---|
Key | The partition key |
SSTable Count | Number of SSTables that contain the partition |
Size | Total uncompressed size of the partition |
Rows | (3.x only) Total number of clustering rows in the partition |
Cells | Number of cells in the partition |
Tombstones | Number of cell or range tombstones |
(droppable) | Number of tombstones that can be dropped as per gc_grace_seconds |
SSTables: Metadata about SSTables as it relates to partitions.
Column | Description |
SSTable | Data.db filename of SSTable |
Size | Uncompressed size |
Min Timestamp | Minimum cell timestamp in the SSTable |
Max Timestamp | Maximum cell timestamp in the SSTable |
Partitions | Number of partitions |
(deleted) | Number of row level partition deletions |
(avg size) | Average uncompressed partition size in SSTable |
(max size) | Maximum uncompressed partition size in SSTable |
Rows | (3.x only) Total number of clustering rows in SSTable |
(deleted) | (3.x only) Number of row deletions in SSTable |
Cells | Number of cells in the SSTable |
Tombstones | Number of cell or range tombstones in the SSTable |
(droppable) | Number of tombstones that are droppable according to gc_grace_seconds |
(range) | Number of range tombstones |
Cell Liveness | Percentage of live cells. Does not consider tombstones or cell updates shadowing cells. That is it is percentage of non-tombstoned cells to total number of cells. |
ic-purge
Usage
ic-purge [-r <limit>] [-n <num>] [-t <snapshot>] [-f <filter>] <keyspace> <column-family>
-h |
Display help |
-b |
Batch mode. Uses progress indicator that is friendly for running in batch jobs. |
-r <limit> |
Limit read throughput to ratelimit MB/s (unlimited by default, 16 is probably a good starting point if you want to limit) |
-n <num> |
Number of partitions to display in leaders lists |
-t <name> |
Snapshot to analyse. Snapshot is created if none is specified. |
Output
Largest reclaimable partitions: Partitions with the largest amount of reclaimable data
Column | Description |
---|---|
Key | The partition key |
Size | Total uncompressed size of the partition |
Reclaim | Reclaimable uncompressed size |
Generations | SSTable generations the partition belongs to |
Downloads
- ic-sstable-tools-3_11_3.jar (67 KB)
- ic-sstable-tools-3_0_17.jar (67 KB)
- ic-sstable-tools-2_2_13.jar (61 KB)
- ic-sstable-tools-2_1_20.jar (59 KB)
- ic-sstable-tools-2_0_17.jar (59 KB)