Apache Cassandra 4.1 Features: Guardrails Framework
Andrés de la Peña, Apache Cassandra committer and software engineer at DataStax, will walk you through a new feature in Apache Cassandra 4.1: the Guardrails Framework. The new framework allows disabling certain features, disallowing some specific values, and defining soft and hard limits to certain database magnitudes.
Last year saw the first major release of Apache Cassandra in six years. The GA release of Cassandra version 4.0 established a solid foundation as it underwent intensive testing to ensure that upgrading the distributed database and deployments were easy and smooth.
On this solid basis, the project has been able to commit to a one-year cadence for releases. We expect to announce 4.1 very soon and it includes new features as well as bug fixes and improvements.
A minor release for Apache Cassandra essentially prioritizes new, non-API changing, and non-default behavior breaking features and changes. It is only when a release involves disruptive changes, such as API and protocol changes, that it becomes a major release.
So while this may be a minor release in name, it includes many features that the community has been working on since the lengthy code freeze for 4.0.
One feature is a new framework called Guardrails to enforce good practices.
As any Cassandra operator will acknowledge, it can be easy for certain user actions to degrade the performance and even the availability of an Apache Cassandra cluster. For example, on the schema side, users can create too many tables, or too many secondary indexes, leading to excessive use of resources. On the query side, users can run queries touching too many partitions that might involve all nodes in the cluster. Even worse, they can simply run a query using costly replica-side filtering, which would potentially read all the table contents. All these are well-known Cassandra anti-patterns, and often administrators have to be vigilant about preventing users from incurring them. Even if one is perfectly aware of the right usage patterns, it’s easy to lose track of things like the size of non-frozen collections.
The new framework allows disabling certain features, disallowing some specific values, and defining soft and hard limits to certain database magnitudes.
Guardrails are defined as regular properties in the Cassandra configuration file,
cassandra.yaml, and not any different to any other properties in that file. They look like:
tables_warn_threshold: -1 tables_fail_threshold: -1 secondary_indexes_per_table_warn_threshold: -1 secondary_indexes_per_table_fail_threshold: -1 allow_filtering_enabled: true partition_keys_in_select_warn_threshold: -1 partition_keys_in_select_fail_threshold: -1 collection_size_warn_threshold: collection_size_fail_threshold:
Note that this is not an exhaustive list of all the available guardrails. There are many more, and new ones are under development, but this does give you an idea of the potential options. Note also that all guardrails are disabled by default. A list of enabled guardrails would look like this:
tables_warn_threshold: 5 tables_fail_threshold: 10 secondary_indexes_per_table_warn_threshold: 5 secondary_indexes_per_table_fail_threshold: 10 allow_filtering_enabled: false partition_keys_in_select_warn_threshold: 10 partition_keys_in_select_fail_threshold: 20 collection_size_warn_threshold: 10MiB collection_size_fail_threshold: 20MiB
The guardrails defined in
cassandra.yaml are applied as the node starts. It’s also possible to dynamically update the guardrails configuration through JMX at any time. All guardrails are grouped under the MBean named
org.apache.cassandra.db.Guardrails. There are plans to also support dynamically updating guardrails through virtual tables, although this option is not yet available.
Guardrails in Action
Most guardrails are checked on the coordinator node during the execution of a CQL query. When the soft limit of a guardrail is triggered, it raises a CQL client warning and logs a server-side warning message. For example, if we have established a soft limit of five tables (
tables_warn_threshold: 5) and we try to create a sixth table we’ll see a warning but the table will still be created:
[email protected]> CREATE TABLE k.t6 (k int PRIMARY KEY, v int); Warnings : Guardrail tables violated: Creating table t6, current number of tables 6 exceeds warning threshold of 5.
However, if the hard limit is reached, the user operation will be aborted with a
GuardrailViolatedException, preventing the potentially harmful operation from happening. Continuing with the previous example, if we have a hard limit of ten tables (
tables_warn_threshold: 10) and we try to create an eleventh table, we will see an error and the eleventh table won’t be created:
[email protected]> CREATE TABLE k.t11 (k int PRIMARY KEY, v int); InvalidRequest: Error from server: code=2200 [Invalid query] message="Guardrail tables violated: Cannot have more than 10 tables, aborting the creation of table t11"
Boolean guardrails for disabling features, such as
allow_filtering_enabled, don’t have a soft limit, and they will always abort the operations attempting to use the disabled feature. For example, if boolean guardrail for queries using filtering is disabled (
allow_filtering_enabled: false) we will see a failure every time we try to run one of those queries, and the query won’t be run:
[email protected]> SELECT * FROM k.t1 WHERE v=0 ALLOW FILTERING; InvalidRequest: Error from server: code=2200 [Invalid query] message="Guardrail allow_filtering violated: Querying with ALLOW FILTERING is not allowed"
The triggering of a guardrail will always emit a diagnostic event of type
GuardrailEvent. Thus, anyone subscribing to diagnostic events of that type will be able to monitor guardrail violations and trigger any desired actions on them, like storing metrics, contacting the offending user, etc.
Dr. Daniel Bryant (Big Picture Tech Ltd)
Lian Li (Tilt Dev)
Users and Extensibility
Guardrails are only applied to the operations of regular users, so they are neither checked for superuser queries nor internal queries. By default, all regular users are subjected to the same guardrails configuration values defined in
cassandra.yaml. However, the configuration for guardrails is an extensible API defined by the interfaces
GuardrailsConfigProvider. These interfaces provide the configurations of every guardrail as a function of the user running the guarded operation, so third-party alternative implementations could provide different guardrail configurations depending on the user, or on some other factors. Custom implementations can be provided through the JVM system property
cassandra.custom_guardrails_config_provider_class. It is important to know that this API isn’t officially supported yet, and there can be changes breaking backward compatibility in any minor release.
In general, guardrails are associated with a specific CQL query. However, due to technical limitations, some guardrails are checked in the background, without being associated with any specific query. That’s the case for example of the guardrails for the size or number of items of a non-frozen collection. Although we can do some checks when a query writes a new collection fragment, we cannot know if there are other fragments of the collection previously stored on the SSTables. We could of course check into the SSTables, but it would involve a costly read-before-write operation. Instead, the guardrail checks the size of all collections every time an SSTable is written to disk, which happens on memtable flush or during compaction. If a large collection is detected the guardrail is triggered and will emit the proper log messages and diagnostic events, but we won’t abort any operation because we have missed the association with the original query. Future guardrails for similar things, like partition size, are likely to work in the same way.
Another example of a guardrail that partially runs in the background is the one for disk usage. Its configuration looks like this:
data_disk_usage_percentage_warn_threshold: -1 data_disk_usage_percentage_fail_threshold: -1 data_disk_usage_max_disk_size:
This guardrail is checked by a background task that periodically checks the disk space usage. If the disk usage exceeds the percentage specified in the configuration, the guardrail will emit the proper log messages and diagnostic events, although these won’t be associated with any specific query. However, the disk usage status calculated by the periodic task is tracked and propagated through Gossip, so every node is aware of the disk usage of its peers. That information will be used by write queries to check the guardrail again and warn about or abort queries depending on when the disks on the targeted replicas are close to being full:
[email protected]> INSERT INTO k.t (k, v) VALUES (1, 10); InvalidRequest: Error from server: code=2200 [Invalid query] message="Guardrail replica_disk_usage violated: Write request failed because disk usage exceeds failure threshold"
Expect more Guardrails
Adding new guardrails to Cassandra should be relatively easy since the framework provides base classes for several types of guardrail and utilities for parsing configuration and testing. More importantly, adding new safety checks in the form of guardrails should guarantee that they have a homogeneous, consistent behavior, and that they can benefit from new features that are added for every guardrail.
At this moment, there are guardrails for:
- Number of user keyspaces.
- Number of user tables.
- Number of columns per table.
- Number of secondary indexes per table.
- Number of materialized tables per table.
- Number of fields per user-defined type.
- Number of items in a collection.
- Number of partition keys selected by an IN restriction.
- Number of partition keys selected by the cartesian product of multiple IN restrictions.
- Allowed table properties.
- Allowed read consistency levels.
- Allowed write consistency levels.
- Collections size.
- Query page size.
- Minimum replication factor.
- Data disk usage, defined either as a percentage or as an absolute size.
- Whether user-defined timestamps are allowed.
- Whether GROUP BY queries are allowed.
- Whether the creation of secondary indexes is allowed.
- Whether the creation of uncompressed tables is allowed.
- Whether querying with ALLOW FILTERING is allowed.
- Whether dropping or truncating a table is allowed.
It is worth mentioning that many of these guardrails were added in the last few months, some of them by newcomers to the project. That, in my opinion, indicates how easy it is to add new guardrails, and we expect to have more guardrails in the future.