August 6th, 2014


“Cassandra is a solid solution that enables us to apply machine learning at a big-data scale. The result is that it allows us to increase fraud detection rates over 40% compared to the current best solutions in the market.

– Paulo Marques, CTO at Feedzai


Paulo Marques CTO at Feedzai





Feedzai combines big data science and behavioral analytics to make commerce safe for financial networks, bank and retailers. I am the Chief Technology Officer of Feedzai and responsible for overall product development and technology strategy.

Feedzai Fraud Prevention™ is a risk-decisioning system for payment transactions. In a nutshell, what we do is to analyze credit and debit card transactions in real-time in extremely short latencies. Typically we are deployed on the core banking systems and are analyzing thousands of transactions per second with latencies in the order of just a few milliseconds.

Our clients are financial service organizations and retailers, and fraud rates are increasing which impacts all businesses. At the same time, big data and behavioral analytics are opportunities for our clients to reduce fraud loss. However, this requires analyzing massive data volume within a very short amount of time (typically in the millisecond range).


Finding a scalable secret sauce

Part of our “secret sauce” is the ability to track the individual spending behavior of every person that is a customer of an institution, accessing those profiles in a millisecond. By being able to track several hundreds of metrics that are specific of each person and effectively use them while we are scoring each transaction, our machine learning algorithms are able to make much more accurate analysis.

We conducted comprehensive benchmarks over several NoSQL databases. These included not only performance benchmarks (throughput and latency) but also scalability and reliability. For us, it was not only very important to have a database that supports high throughputs with very short latencies, but one that doesn’t lose data or starts behaving erratically when submitted to high loads.

From a technical perspective, Cassandra came out as the top winner in terms of the combination of high throughout, low latency and overall reliability. Cassandra also meets our needs for an open-source solution, which our clients like from an approach perspective. DataStax’s support being multi-region also played a big role. Global support is critical for financial service clients who run global payment networks, particularly as ecommerce and cross-border transactions increasingly challenge the risk status quo.


Stopping fraud with Cassandra

We use machine-learning behavioral models combined with a big-data approach to improve fraud detection rates by over 40%. Our clients have hundreds of millions of customers with detailed “segment of one” profiles and each containing hundreds of thousands of metrics that all need to be queried and updated in real-time. Cassandra enables Feedzai’s core engine to use all these detailed metrics, keep up with our clients’ growth and to horizontally scale without a slowdown in performance.

Cassandra lets our software keep, access and update hyper-detailed behavioral profiles of hundreds of millions of customers, devices, products—about anything our clients need—in order to make payment risk authorization decisions during the customer buying window.

For the most part, our deployments are made on premises, in our client’s data centers. Most of them are in the range of 10 data nodes with 3-way replication. Given our aggressive latency requirements, all nodes run SSD disks for storing data.


Cassandra pays off

One of the advantages of developing on open source systems like Cassandra is the community of users. This gives our product a distinct competitive advantage because our customers have access to the same support community and knowledge pool that our developers do. PlanetCassandra is just amazing in that regard. All the materials, the people, the meetups – it’s such a vibrant community. Recently we have hosted a meetup on Cassandra and the number of people that showed up and their interest was incredible.

For all people getting started with Cassandra I highly recommend using the DataStax distribution and the pre-bundled images. This represents an important jump-start in terms of learning curve when compared with using the “bare-bones” version, saving several hours of less productive work. The other important point, especially for people coming from a more classical RDBMS world, is that the data model is really different. This requires thought so just trying to adapt/adjust what you knew about relations, 3rd normal form, and similar concepts simply doesn’t work. Make sure you spend time getting to know the new data model. It pays off.



Cassandra is a solid solution that enables us to apply machine learning at a big-data scale. The result is that it allows us to increase fraud detection rates over 40% compared to the current best solutions in the market. This contributes not only for our clients to save money, by preventively blocking fraudulent transactions, but most importantly for making e-commerce safe at a global scale. We believe in doing good – stopping fraud is a big part of that.