Illustration Image

How to Optimize Tombstone Issues Caused by Updating Primary Keys in ScyllaDB Materialized Views?

I’m using ScyllaDB to store subscription relationships in the following business scenario:

Requirements: The data model needs to support these queries:

  1. app_id = ? AND user_id = ? AND ts >= <lastTs> LIMIT 10 for incremental data queries to sync with the client.

  2. app_id = ? AND user_id = ? AND target_user_id = ? for precise subscription lookups.

Table Structure:

CREATE KEYSPACE IF NOT EXISTS userstore WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};

CREATE TABLE IF NOT EXISTS subscription (
    app_id bigint,
    user_id text,
    target_user_id text,
    expiry int,
    ts bigint,
    PRIMARY KEY((app_id, user_id), target_user_id)
) WITH CLUSTERING ORDER BY (target_user_id ASC);

CREATE MATERIALIZED VIEW IF NOT EXISTS subscription_by_time AS
 SELECT app_id, user_id, target_user_id, ts, expiry
 FROM subscription
 WHERE app_id IS NOT NULL
   AND user_id IS NOT NULL
   AND target_user_id IS NOT NULL
   AND ts IS NOT NULL
 PRIMARY KEY((app_id, user_id), ts, target_user_id)
 WITH CLUSTERING ORDER BY (ts ASC);

Table Explanation:

  • subscription: Supports Query 2 (app_id = ? AND user_id = ? AND target_user_id = ?).
  • subscription_by_time: Supports Query 1 (app_id = ? AND user_id = ? AND ts >= LIMIT 10).

Issue:

  1. When a subscription relationship is updated (e.g., updating the ts field with the latest timestamp), this triggers an update on the materialized view, where ts is part of the primary key.

  2. This operation generates tombstones (deleted entries) in the materialized view. Tombstone accumulation leads to slower queries and, in severe cases, may cause cluster instability.

Is there a data structure that satisfies the functional requirements while avoiding tombstone issues?

Become part of our
growing community!
Welcome to Planet Cassandra, a community for Apache Cassandra®! We're a passionate and dedicated group of users, developers, and enthusiasts who are working together to make Cassandra the best it can be. Whether you're just getting started with Cassandra or you're an experienced user, there's a place for you in our community.
A dinosaur
Planet Cassandra is a service for the Apache Cassandra® user community to share with each other. From tutorials and guides, to discussions and updates, we're here to help you get the most out of Cassandra. Connect with us and become part of our growing community today.
© 2009-2023 The Apache Software Foundation under the terms of the Apache License 2.0. Apache, the Apache feather logo, Apache Cassandra, Cassandra, and the Cassandra logo, are either registered trademarks or trademarks of The Apache Software Foundation. Sponsored by Anant Corporation and Datastax, and Developed by Anant Corporation.

Get Involved with Planet Cassandra!

We believe that the power of the Planet Cassandra community lies in the contributions of its members. Do you have content, articles, videos, or use cases you want to share with the world?