Adobe Audience Manager is a data management platform service that is part of Adobe Experience Cloud, a B2B range of products and services that manage the entire customer experience. The main goal of Audience Manager is to answer the question “who is this user” by collecting data from a billion of data points, then classifying and organizing them. The platform has a daily average of 47 billion calls which translates to more than 200 billion calls in Cassandra.
Cassandra is used by Adobe in eight AWS regions to be close to end users and minimize data latency. Data is collected from different geographical locations then pushed to the core database running in a single central location. After the data is processed, it is pushed back to the relevant edge locations so each one keeps a persistent partial cache of the core data. Cassandra is used in these edge locations to store the partial cache.
Adobe chose Cassandra because of its high availability, cross-data center support, horizontal scalability, and high performance read/writes. After more than 10 years, three out of four proved to be critical for Adobe, making it a good decision. Cross-data center is not useful in this case but a very nice feature to have.
In Audience Manager, each edge data center has two components: the Data Collection Service (DCS) which serves user requests and interacts with the Profile Cache Service (PCS) which is backed by Cassandra. The data in Cassandra is split into two sets: profiles and IDs. Each of these sets are further split into real-time and backend clusters.
Backend clusters are used by DCS for reading only where the data is backfilled by streaming from the core database and can take up to 48 hours to process. Real-time clusters are used for both reading and writing. When a read request is issued to the Data Collection Service, it reads from both clusters then merges the results before sending the response.
The partial caches deployed across 8 AWS regions and 32 Cassandra clusters with more than 800 EC2 instances holding 260 terabytes of data. Adobe uses various technologies including Terraform, Puppet, Spinnaker, Amazon Linux, Grafana, Prometheus, Python, Thanos, and Alertmanager to maintain the Cassandra infrastructure.
Cassandra exposes a lot of application metrics through its Java agent that are scraped by Prometheus along with the system metrics that are delivered by the node exporter. Prometheus service resides in all the 8 AWS regions and has a data retention of 14 days. The metrics are centralized in the Thanos service and persistently stored for more than a year. Alertmanager routes alerts to communication channels such as email, Slack, and PagerDuty. Adobe uses Grafana panels to visualize metric data.