Books by Monster SCALE Summit 25 Speakers: Distributed Data Systems & Beyond
Monster SCALE Summit speakers have amassed a rather impressive list of publications, including quite a few books. This blog highlights 10+ of them.
If you’ve seen the Monster SCALE Summit agenda, you know that the stars have aligned nicely. In just two half days, from anywhere you like, you can learn from 60+ outstanding speakers – all exploring extreme scale engineering challenges from a variety of angles. Distributed databases, event streaming, AI/ML, Kubernetes, Rust…it’s all on the agenda.
If you read the bios of our speakers, you’ll note that many have written books. This blog highlights eleven of those Monster SCALE Summit speakers’ books – plus two new books by past conference speakers.
Once you register for the conference (it’s free + virtual), you’ll gain 30-day full access to the complete O’Reilly library (thanks to O’Reilly, a conference media sponsor). And Manning Publications is also a media sponsor. They are offering the Monster SCALE community a nice 50% discount on all Manning books . One more bonus: conference attendees who participate in the speaker chat will be eligible to win book bundles, courtesy of Manning.
See the agenda and register – it’s free
Designing Data-Intensive Applications, 2nd Edition
By Martin Kleppmann and Chris RiccominiO’Reilly
ETA: December 2025
Data is at the center of many challenges in system design today. Difficult issues such as scalability, consistency, reliability, efficiency, and maintainability need to be resolved. In addition, there’s an overwhelming variety of tools and analytical systems, including relational databases, NoSQL datastores, plus data warehouses and data lakes. What are the right choices for your application? How do you make sense of all these buzzwords?
In this second edition, authors Martin Kleppmann and Chris Riccomini build on the foundation laid in the acclaimed first edition, integrating new technologies and emerging trends. You’ll be guided through the maze of decisions and trade-offs involved in building a modern data system, from choosing the right tools like Spark and Flink to understanding the intricacies of data laws like the GDPR.
Peer under the hood of the systems you already use, and learn to use them more effectively
Make informed decisions by identifying the strengths and weaknesses of different tools
Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity
Understand the distributed systems research upon which modern databases are built
Peek behind the scenes of major online services, and learn from their architectures
Martin and Chris are presenting “Designing Data-Intensive Applications in 2025”
Think Distributed Systems
Dominik Tornow
ETA: Fall 2025
Manning (use code SCALE2025 for 50% off)
All modern software is distributed. Let’s say that again—all modern software is distributed. Whether you’re building mobile utilities, microservices, or massive cloud native enterprise applications, creating efficient distributed systems requires you to think differently about failure, performance, network services, resource usage, latency, and much more. This clearly-written book guides you into the mindset you’ll need to design, develop, and deploy scalable and reliable distributed systems.
In Think Distributed Systems you’ll find a beautifully illustrated collection of mental models for:
Correctness, scalability, and reliability
Failure tolerance, detection, and mitigation
Message processing
Partitioning and replication
Consensus
Dominik is presenting “The Mechanics of Scale”
Latency: Reduce Delay in Software Systems
Pekka Enberg
ETA: Summer 2025
Manning (use code SCALE2025 for 50% off)
Slow responses can kill good software. Whether it’s recovering microseconds lost while routing messages on a server or speeding up page loads that keep users waiting, finding and fixing latency can be a frustrating part of your work as a developer. This one-of-a-kind book shows you how to spot, understand, and respond to latency wherever it appears in your applications and infrastructure.
This book balances theory with practical implementations, turning academic research into useful techniques you can apply to your projects. In Latency you’ll learn:
What latency is—and what it is not
How to model and measure latency
Organizing your application data for low latency
Making your code run faster
Hiding latency when you can’t reduce it
Pekka presented “Patterns of Low Latency” at P99 CONF 2024. And his Turso co-founder Glauber Costa will be presenting “Who Needs One Database Anyway?” at Monster SCALE Summit
Writing for Developers: Blogs That Get Read
By Piotr Sarna and Cynthia Dunlop
January 2025
Amazon | Manning (use code SCALE2025 for 50% off)
This book is a practical guide to writing more compelling engineering blog posts. We discuss strategies for nailing all phases of the technical blogging process. And we have quite a bit of fun exploring the core blog post patterns that are most common across engineering blogs today, like “The Bug Hunt,” “How We Built It,” “Lessons Learned,” “We Rewrote It in X,” “Thoughts on Trends,” etc. Each “pattern” chapter includes an analysis of real-world examples as well as specific dos/don’ts for that particular pattern. There’s a section on moving from blogging into opportunities such as article writing, conference speaking, and book writing. Finally, we wrap with a critical (and often amusing) look at generative AI blogging uses and abuses.
Oh…and there’s also a foreword by Bryan Cantrill and an afterword by Scott Hanselman!
Readers will learn how to:
Pinpoint topics that make intriguing posts
Apply popular blog post design patterns
Rapidly plan, draft, and optimize blog posts
Make your content clearer and more convincing to technical readers
Tap AI for revision while avoiding misuses and abuses
Increase the impact of all your technical communications
Piotr is presenting “A Dist Sys Programmer’s Journey Into AI”
ScyllaDB in Action
Bo Ingram
October 2024
Amazon | Manning (use code SCALE2025 for 50% off) | ScyllaDB (free chapters)
ScyllaDB in Action is your guide to everything you need to know about ScyllaDB, from your very first queries to running it in a production environment. It starts you with the basics of creating, reading, and deleting data and expands your knowledge from there. You’ll soon have mastered everything you need to build, maintain, and run an effective and efficient database.
This book teaches you ScyllaDB the best way—through hands-on examples. Dive into the node-based architecture of ScyllaDB to understand how its distributed systems work, how you can troubleshoot problems, and how you can constantly improve performance.You’ll learn how to:
• Read, write, and delete data in ScyllaDB
• Design database schemas for ScyllaDB
• Write performant queries against ScyllaDB
• Connect and query a ScyllaDB cluster from an application
• Configure, monitor, and operate ScyllaDB in production
Bo’s colleagues Ethan Donowitz and Vicki Niu are both presenting at Monster SCALE Summit
Data Virtualization in the Cloud Era
Dr. Daniel Abadi and Andrew Mott
July 2024
O’Reilly
Data virtualization had been held back by complexity for decades until recent advances in cloud technology, data lakes, networking hardware, and machine learning transformed the dream into reality. It’s becoming increasingly practical to access data through an interface that hides low-level details about where it’s stored, how it’s organized, and which systems are needed to manipulate or process it. You can combine and query data from anywhere and leave the complex details behind.
In this practical book, authors Dr. Daniel Abadi and Andrew Mott discuss in detail what data virtualization is and the trends in technology that are making data virtualization increasingly useful. With this book, data engineers, data architects, and data scientists will explore the architecture of modern data virtualization systems and learn how these systems differ from one another at technical and practical levels.
By the end of the book, you’ll understand:
The architecture of data virtualization systems
Technical and practical ways that data virtualization systems differ from one another
Where data virtualization fits into modern data mesh and data fabric paradigms
Modern best practices and case study use cases
Daniel is presenting “Two Leading Approaches to Data Virtualization: Which Scales Better?”Bonus: Read Daniel Abadi’s article on the PACELC theorem.
Database Performance at Scale
By Felipe Cardeneti Mendes, Piotr Sarna, Pavel Emelyanov, and Cynthia Dunlop
October 2023
Amazon | ScyllaDB (free)
Discover critical considerations and best practices for improving database performance based on what has worked, and failed, across thousands of teams and use cases in the field. This book provides practical guidance for understanding the database-related opportunities, trade-offs, and traps you might encounter while trying to optimize data-intensive applications for high throughput and low latency.
Whether you’re building a new system from the ground up or trying to optimize an existing use case for increased demand, this book covers the essentials. The ultimate goal of the book is to help you discover new ways to optimize database performance for your team’s specific use cases, requirements, and expectations.
Understand often overlooked factors that impact database performance at scale
Recognize data-related performance and scalability challenges associated with your project
Select a database architecture that’s suited to your workloads, use cases, and requirements
Avoid common mistakes that could impede your long-term agility and growth
Jumpstart teamwide adoption of best practices for optimizing database performance at scale
Felipe is presenting “ScyllaDB is No Longer “Just a Faster Cassandra”Piotr is presenting “A Dist Sys Programmer’s Journey Into AI”
Algorithms and Data Structures for Massive Datasets
Dzejla Medjedovic, Emin Tahirovic, and Ines Dedovic
May 2022
Amazon | Manning (use code SCALE2025 for 50% off)
Algorithms and Data Structures for Massive Datasets reveals a toolbox of new methods that are perfect for handling modern big data applications. You’ll explore the novel data structures and algorithms that underpin Google, Facebook, and other enterprise applications that work with truly massive amounts of data. These effective techniques can be applied to any discipline, from finance to text analysis. Graphics, illustrations, and hands-on industry examples make complex ideas practical to implement in your projects—and there’s no mathematical proofs to puzzle over. Work through this one-of-a-kind guide, and you’ll find the sweet spot of saving space without sacrificing your data’s accuracy.
Readers will learn:
Probabilistic sketching data structures for practical problems
Choosing the right database engine for your application
Evaluating and designing efficient on-disk data structures and algorithms
Understanding the algorithmic trade-offs involved in massive-scale systems
Deriving basic statistics from streaming data
Correctly sampling streaming data
Computing percentiles with limited space resources
Dzejla is presenting “Read- and Write-Optimization in Modern Database Infrastructures”
Kafka: The Definitive Guide, 2nd Edition
By Gwen Shapira, Todd Palino, Rajini Sivaram, Krit Petty
November 2021
Amazon | O’Reilly
Engineers from Confluent and LinkedIn responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream processing applications with this platform. Through detailed examples, you’ll learn Kafka’s design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer.
You’ll learn:
Best practices for deploying and configuring Kafka
Kafka producers and consumers for writing and reading messages
Patterns and use-case requirements to ensure reliable data delivery
Best practices for building data pipelines and applications with Kafka
How to perform monitoring, tuning, and maintenance tasks with Kafka in production
The most critical metrics among Kafka’s operational measurements
Kafka’s delivery capabilities for stream processing systems
Gwen is presenting “The Nile Approach: Re-engineering Postgres for Millions of Tenants”
The Missing README: A Guide for the New Software Engineer
by Chris Riccomini and Dmitriy Ryaboy
Amazon | O’Reilly
August 2021
For new software engineers, knowing how to program is only half the battle. You’ll quickly find that many of the skills and processes key to your success are not taught in any school or bootcamp. The Missing README fills in that gap—a distillation of workplace lessons, best practices, and engineering fundamentals that the authors have taught rookie developers at top companies for more than a decade.
Early chapters explain what to expect when you begin your career at a company. The book’s middle section expands your technical education, teaching you how to work with existing codebases, address and prevent technical debt, write production-grade software, manage dependencies, test effectively, do code reviews, safely deploy software, design evolvable architectures, and handle incidents when you’re on-call. Additional chapters cover planning and interpersonal skills such as Agile planning, working effectively with your manager, and growing to senior levels and beyond.
You’ll learn:
How to use the legacy code change algorithm, and leave code cleaner than you found it
How to write operable code with logging, metrics, configuration, and defensive programming
How to write deterministic tests, submit code reviews, and give feedback on other people’s code
The technical design process, including experiments, problem definition, documentation, and collaboration
What to do when you are on-call, and how to navigate production incidents
Architectural techniques that make code change easier
Agile development practices like sprint planning, stand-ups, and retrospectives
Chris and Martin Kleppmann are presenting “Designing Data-Intensive Applications in 2025”
The DynamoDB Book
By Alex Debrie
April 2020
Amazon | Direct
DynamoDB is a highly available, infinitely scalable NoSQL database offering from AWS. But modeling with a NoSQL database like DynamoDB is different than modeling with a relational database. You need to intentionally design for your access patterns rather than creating a normalized model that allows for flexible querying later.
The DynamoDB Book is the authoritative resource in the space, and it’s the recommended resource within Amazon for learning DynamoDB. Rick Houlihan, the former head of the NoSQL Blackbelt team at AWS, said The DynamoDB Book is “definitely a must read if you want to understand how to correctly model data for NoSQL apps.”
The DynamoDB takes a comprehensive approach to teaching DynamoDB, including:
Discussion of key concepts, underlying infrastructure components, and API design;
Explanations of core strategies for data modeling, including one-to-many and many-to-many relationships, filtering, sorting, aggregations, and more;
5 full walkthrough examples featuring complex data models and a large number of access patterns.
Alex is presenting “DynamoDB Cost Optimization Considerations and Strategies”
RESTful Java Patterns and Best Practices: Learn Best Practices to Efficiently Build Scalable, Reliable, and Maintainable High Performance Restful Services
By Bhakti MehtaAmazon
September, 2014
This book provides an overview of the REST architectural style and then dives deep into best practices and commonly used patterns for building RESTful services that are lightweight, scalable, reliable, and highly available. It’s designed to help application developers get familiar with REST. The book explores the details, best practices, and commonly used REST patterns as well as gives insights on how Facebook, Twitter, PayPal, GitHub, Stripe, and other companies are implementing solutions with RESTful services.