Illustration Image
Company: Spotify
Industry: Entertainment
Functional Use Case: Recommendation, Data Store

Spotify drives personalization with Apache Cassandra

Spotify utilizes Apache Cassandra in its infrastructure to serve data behind a personalization algorithm, which recommends songs to its users. Spotify’s data infrastructure is built on open-source Apache software platforms like Kafka, Storm, and Cassandra. These systems work together to collect and analyze data in real-time and store it in user profiles as personalized suggestions.

Cassandra is used by Spotify for storage. It consists of over 100 Cassandra clusters, each containing a nested storage system. User profile attributes and metadata about playlists and artists are stored in these clusters.

The personalization algorithm at Spotify is responsible for creating playlists like Discover Weekly. It uses three data analysis models: Collaborative Filtering, Natural Language Processing (NLP), and Audio Analysis.

Collaborative Filtering analyzes data stored in matrices to determine a user’s taste in music and a song’s characteristics. Spotify recommends songs by comparing a user’s vector to other users’ vectors to find similar songs. NLP models analyze text on the internet about songs to determine cultural similarity. Sentiment analysis APIs are used to create vectors from the collected text.

Audio models analyze the raw audio tracks themselves by using convolutional neural networks to compute representative statistics of the song, such as the time signature, key, and tempo.
Spotify’s Home Screen is another example of personalization. It is organized like a bookcase, with each shelf displaying personalized content based on a user’s recent activity, favorite playlists, new releases, and custom mixtapes.

Spotify uses machine learning for both analysis (displaying content based on previous activity) and exploration (displaying suggested content to initiate user engagement and modifying it based on user interaction). Once this process is complete, the data is written to Cassandra and served via its (micro) service layer.

Overall, Apache Cassandra provides a great platform from which to serve recommendation data, and Spotify is confident in its ability to scale to meet the needs of their ever-growing customer base.

Stack Includes: Apache Cassandra, GCP, Java

Want to share your use case?

Planet Cassandra is the home page for the Cassandra Community, where everyone in the community can share their use cases.

Show off what you've done & help others learn following your example & contribution.

Become part of our
growing community!
Welcome to Planet Cassandra, a community for Apache Cassandra®! We're a passionate and dedicated group of users, developers, and enthusiasts who are working together to make Cassandra the best it can be. Whether you're just getting started with Cassandra or you're an experienced user, there's a place for you in our community.
A dinosaur
Planet Cassandra is a service for the Apache Cassandra® user community to share with each other. From tutorials and guides, to discussions and updates, we're here to help you get the most out of Cassandra. Connect with us and become part of our growing community today.
© 2009-2023 The Apache Software Foundation under the terms of the Apache License 2.0. Apache, the Apache feather logo, Apache Cassandra, Cassandra, and the Cassandra logo, are either registered trademarks or trademarks of The Apache Software Foundation. Sponsored by Anant Corporation and Datastax, and Developed by Anant Corporation.

Get Involved with Planet Cassandra!

We believe that the power of the Planet Cassandra community lies in the contributions of its members. Do you have content, articles, videos, or use cases you want to share with the world?