Charybdis: Building High-Performance Distributed Rust Backends with ScyllaDB
Build a high-performance distributed Rust backend—without losing the expressiveness and ease of use of Ruby on Rails and SQLEditor’s note: This post was originally published on Goran’s blog.Ruby on Rails (RoR) is one of the most renowned web frameworks. When combined with SQL databases, RoR transforms into a powerhouse for developing back-end (or even full-stack) applications. It resolves numerous issues out of the box, sometimes without developers even realizing it. For example, with the right callbacks, complex business logic for a single API action is automatically wrapped within a transaction, ensuring ACID (Atomicity, Consistency, Isolation, Durability) compliance. This removes many potential concerns from the developer’s plate. Typically, developers only need to define a functional data model and adhere to the framework’s conventions — sounds easy, right?
However, as with all good things, there are trade-offs. In this case, it’s performance. While the RoR and RDBMS combination is exceptional for many applications, it struggles to provide a suitable solution for large-scale systems. Additionally, using frameworks like RoR alongside standard relational databases introduces another pitfall: it becomes easy to develop poor data models. Why? Simply because SQL databases are highly flexible, allowing developers to make almost any data model work. We just utilize more indexing, joins, and preloading to avoid the dreaded N+1 query problem. We’ve all fallen into this trap at some point.
What if we could build a high-performance, distributed, Rust-based backend while retaining some of the expressiveness and ease-of-use found in RoR and SQL?
This is where ScyllaDB and Charybdis ORM come into play.
Before diving into these technologies, it’s essential to understand the fundamental differences between traditional Relational Database Management Systems (RDBMS) and ScyllaDB NoSQL.
LSM vs. B+ Tree
ScyllaDB, like Cassandra, employs a Log-Structured Merge Tree (LSM) storage engine, which optimizes write operations by appending data to in-memory structures called memtables and periodically flushing them to disk as SSTables. This approach allows for high write throughput and efficient handling of large volumes of data. By using a partition key and a hash function, ScyllaDB can quickly locate the relevant SSTables and memtables, avoiding global index scans and focusing operations on specific data segments.
However, while LSM trees excel at write-heavy workloads, they can introduce read amplification since data might be spread across multiple SSTables. To mitigate this, ScyllaDB/Cassandra uses Bloom filters and optimized indexing strategies. Read performance may occasionally be less predictable compared to B+ trees, especially for certain read patterns.
Traditional SQL Databases: B+ Tree Indexing
In contrast, traditional SQL databases like PostgreSQL and MySQL (InnoDB) use B+ Tree indexing, which provides O(log N) read operations by traversing the tree from root to leaf nodes to locate specific rows. This structure is highly effective for read-heavy applications and supports complex queries, including range scans and multi-table joins.
While B+ trees offer excellent read performance, write operations are slower compared to LSM trees due to the need to maintain tree balance, which may involve node splits and more random I/O operations. Additionally, SQL databases benefit from sophisticated caching mechanisms that keep frequently accessed index pages in memory, further enhancing read efficiency.
Horizontal Scalability
ScyllaDB/Cassandra: Designed for Seamless Horizontal Scaling
ScyllaDB/Cassandra are inherently built for horizontal scalability through their shared-nothing architecture. Each node operates independently, and data is automatically distributed across the cluster using consistent hashing. This design ensures that adding more nodes proportionally increases both storage capacity and compute resources, allowing the system to handle growing workloads efficiently. The automatic data distribution and replication mechanisms provide high availability and fault tolerance, ensuring that the system remains resilient even if individual nodes fail.
Furthermore, ScyllaDB/Cassandra offer tunable consistency levels, allowing developers to balance between consistency and availability based on application requirements. This flexibility is particularly advantageous for distributed applications that need to maintain performance and reliability at scale.
Traditional SQL Databases: Challenges with Horizontal Scaling
Traditional SQL databases, on the other hand, were primarily designed for vertical scalability, relying on enhancing a single server’s resources to manage increased load. While replication (primary-replica or multi-primary) and sharding techniques enable horizontal scaling, these approaches often introduce significant operational complexity. Managing data distribution, ensuring consistency across replicas, and handling failovers require careful planning and additional tooling.
Moreover, maintaining ACID properties across a distributed SQL setup can be resource-intensive, potentially limiting scalability compared to NoSQL solutions like ScyllaDB/Cassandra.
Data Modeling
To harness ScyllaDB’s full potential, there is one fundamental rule: data modeling should revolve around queries. This means designing your data structures based on how you plan to access and query them. At first glance, this might seem obvious, prompting the question: Aren’t we already doing this with traditional RDBMSs? Not entirely. The flexibility of SQL databases allows developers to make nearly any data model work by leveraging joins, indexes, and preloading techniques. This often masks underlying inefficiencies, making it easy to overlook suboptimal data designs.
In contrast, ScyllaDB requires a more deliberate approach. You must carefully select partition and clustering keys to ensure that queries are scoped to single partitions and data is ordered optimally. This eliminates the need for extensive indexing and complex joins, allowing ScyllaDB’s Log-Structured Merge (LSM) engine to deliver high performance. While this approach demands more upfront effort, it leads to more efficient and scalable data models. To be fair, it also means that, as a rule, you usually have to provide more information to locate the desired data. Although this can initially appear challenging, the more you work with it, the more you naturally develop the intuition needed to create optimal models.
Charybdis
Now that we have grasped the fundamentals of data modeling in ScyllaDB, we can turn our attention to Charybdis. Charybdis is a Rust ORM built on top of the ScyllaDB Rust Driver, focusing on ease of use and performance. Out of the box, it generates nearly all available queries for a model and provides helpers for custom queries. It also supports automatic migrations, allowing you to run commands to migrate the database structure based on differences between model definitions in your code and the database. Additionally, Charybdis supports partial models, enabling developers to work seamlessly with subsets of model fields while implementing all traits and functionalities that are present in the main model.
Sample User Model
Note: We will always query a user by id, so we simply added id as the partition key, leaving the clustering key empty.
Installing and Running Migrations
First, install the migration tool:
cargo install charybdis-migrate
Within your src/ directory, run the migration:
migrate --hosts <host> --keyspace <your_keyspace> --drop-and-replace (optional)
This command will create the users table with fields defined in your model. Note that for migrations to work, you need to use types or aliases defined within charybdis::types::*.
Basic Queries for the User Model
Sample Models for a Reddit-Like Application
In a Reddit-like application, we have communities that have posts, and posts have comments. Note that the following sample is available within the Charybdis examples repository.
Community Model
Post Model
Actix-Web Services for Posts
Note: The insert_cb method triggers the before_insert callback within our trait, assigning a new id and created_at to the post.
Retrieving All Posts for a Community
Updating a Post’s Description
To avoid potential inconsistency issues, such as concurrent requests to update a post’s description and other fields, we use the automatically generated partial_<model>! macro.
The partial_post! is automatically generated by the charybdis_model macro. The first argument is the new struct name of the partial model, and the others are a subset of the main model fields that we want to work with. In this case, UpdateDescriptionPost behaves just like the standard Post model but operates on a subset of model fields. For each partial model, we must provide the complete primary key, and the main model must implement the Default trait. Additionally, all traits on the main model that are defined below the charybdis_model will automatically be included for all partial models.
Now we can have an Actix service dedicated to updating a post’s description:
Note: To update a post, you must provide all components of the primary key (community_id, created_at, id).
Final Notes
A full working sample is available within the Charybdis examples repository.
Note: We defined our models somewhat differently than in typical SQL scenarios by using three columns to define the primary key. This is because, in designing models, we also determine where and how data will be stored for querying and data transformation.
ACID Compliance Considerations
While ScyllaDB offers exceptional performance and seamless horizontal scalability for many applications, it is not suitable for scenarios where ACID (Atomicity, Consistency, Isolation, Durability) properties are required.
Sample Use Cases Requiring ACID Integrity
Bank Transactions: Ensuring that fund transfers are processed atomically to prevent discrepancies and maintain financial accuracy.
Seat Reservations: Guaranteeing that seat allocations in airline bookings or event ticketing systems are handled without double-booking.
Inventory Management: Maintaining accurate stock levels in e-commerce platforms to avoid overselling items.
For some critical applications, the lack of inherent ACID guarantees in ScyllaDB means that developers must implement additional safeguards to ensure data integrity. In cases where absolute transactional reliability is non-negotiable, integrating ScyllaDB with a traditional RDBMS that provides full ACID compliance might be necessary.
In upcoming articles, we will explore how to handle additional scenarios and how to leverage eventual consistency effectively for the majority of your web application, as well as strategies for maintaining strong consistency when required by your data models in ScyllaDB.