I’m using ScyllaDB to store subscription relationships in the following business scenario:
Requirements: The data model needs to support these queries:
app_id = ? AND user_id = ? AND ts >= <lastTs> LIMIT 10
for incremental data queries to sync with the client.app_id = ? AND user_id = ? AND target_user_id = ?
for precise subscription lookups.
Table Structure:
CREATE KEYSPACE IF NOT EXISTS userstore WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
CREATE TABLE IF NOT EXISTS subscription (
app_id bigint,
user_id text,
target_user_id text,
expiry int,
ts bigint,
PRIMARY KEY((app_id, user_id), target_user_id)
) WITH CLUSTERING ORDER BY (target_user_id ASC);
CREATE MATERIALIZED VIEW IF NOT EXISTS subscription_by_time AS
SELECT app_id, user_id, target_user_id, ts, expiry
FROM subscription
WHERE app_id IS NOT NULL
AND user_id IS NOT NULL
AND target_user_id IS NOT NULL
AND ts IS NOT NULL
PRIMARY KEY((app_id, user_id), ts, target_user_id)
WITH CLUSTERING ORDER BY (ts ASC);
Table Explanation:
- subscription: Supports Query 2 (app_id = ? AND user_id = ? AND target_user_id = ?).
- subscription_by_time: Supports Query 1 (app_id = ? AND user_id = ? AND ts >= LIMIT 10).
Issue:
When a subscription relationship is updated (e.g., updating the ts field with the latest timestamp), this triggers an update on the materialized view, where ts is part of the primary key.
This operation generates tombstones (deleted entries) in the materialized view. Tombstone accumulation leads to slower queries and, in severe cases, may cause cluster instability.
Is there a data structure that satisfies the functional requirements while avoiding tombstone issues?