Illustration Image

Can Cassandra Spark Bulk Reader miss data while exporting? Is it reliable to use for non-analytics purposes [closed]

Cassandra introduced SBR (Spark Bulk Reaader) with CEP-28. I have couple of queries to understand if it is a good candidate for my use case.
My use case - I have a service (let's say S1) that uses Cassandra for persistence. I want to export all the data for joining it with another services S2's data. However, the use case is to find out what all records in S2 are not being referenced in S1 any more and delete such records from S2. Which means, my export of S1 (Cassandra) needs to have all the data originally present in its Cassandra persistence. I can tolerate if a change made in S1 in last few days (let's say 10 days) is missing because I can always take a backup of S2 10 days earlier than S1. However, I cannot tolerate any other kind of data to be missing from the backup of S1 e.g. missing a record which had been created 6 months back or corruption of data. Such a miss from S1 would lead to a data loss.
My Queries -

  1. Is SBR a good option to use to export data from service S1? CEP-28 mentions "analytics workloads" to be the motivation which makes me wonder if it's a good fit for my use case.
  2. I failed to find if SBR is still in Beta or not. Can someone confirm which state it is in as of today and direct me to an official announcement?
  3. The approach in CEP-28 says that it's going to use a sidecar and a library. Does that mean, any changes made to the main Cassandra process may leave (or may already have left) the sidecar/library incompatible with some version of Cassandra which may cause incorrect backup for me leading to a possibility of data loss?
Become part of our
growing community!
Welcome to Planet Cassandra, a community for Apache Cassandra®! We're a passionate and dedicated group of users, developers, and enthusiasts who are working together to make Cassandra the best it can be. Whether you're just getting started with Cassandra or you're an experienced user, there's a place for you in our community.
A dinosaur
Planet Cassandra is a service for the Apache Cassandra® user community to share with each other. From tutorials and guides, to discussions and updates, we're here to help you get the most out of Cassandra. Connect with us and become part of our growing community today.
© 2009-2023 The Apache Software Foundation under the terms of the Apache License 2.0. Apache, the Apache feather logo, Apache Cassandra, Cassandra, and the Cassandra logo, are either registered trademarks or trademarks of The Apache Software Foundation. Sponsored by Anant Corporation and Datastax, and Developed by Anant Corporation.

Get Involved with Planet Cassandra!

We believe that the power of the Planet Cassandra community lies in the contributions of its members. Do you have content, articles, videos, or use cases you want to share with the world?