February 21st, 2014

By 

Paddy Power
 

“We chose Cassandra because of its ability and

proven track record in managing time-series data.”

- John Turner, Software Development Manager at Paddy Power

 

John Turner

 John Turner Software Development Manager at Paddy Power

 

 

Paddy Power

Paddy Power is a betting company that operates primarily in Ireland, the UK and Australia.  They have been around since 1998 and are best known for their novel and often tongue in cheek approach to advertising.  As innovative as we are with our advertising and marketing, we are equally as innovative in the products we offer and how we deliver these to our customers.  As a multi channel consumer business we build and develop online, mobile, phone and retail platforms as well as all the back office systems to support functions such as bet in running and risk management etc.

 

Evaluating NoSQL solutions

Traditionally and to date, we have used a number of different relational database technologies with all their features and limitations that they present. As we challenged ourselves to provide greater levels of performance, scalability and resilience, we started looking at some of the NoSQL solutions that were. We spent quite a bit of time evaluating a number of those solutions, and we really focused on ease of cross datacenter replication and scalability both in terms of performance and cost.

That led us to look at peer-to-peer based NoSQL solutions that had lower operational overheads than solutions based on other distribution topologies. We were looking for solutions that supported auto-sharding so that we didn’t have to manually distribute data across shards as the characteristics of our data changed. That discounted some of the solutions in the marketplace.

Early in our evaluation period, there was some interest in a number of the technologies that provide key based sharding and richer secondary index capabilities such as those found in Mongo. They options were discounted basically because of the cost of scaling, the operational overhead of scaling, and the lack of a compelling requirement for secondary indexes and ad-hoc querying.

 

Real-time products and pricing

The applications that provide real-time product and pricing to our customers are leveraging Cassandra for its ability to persist and store that information in real-time. We chose Cassandra because of its ability and proven track record in managing time-series data. Our product and pricing is in effect data that’s time-series, a bit like a financial market, stock, and stock ticker. We have an eight-node cluster, which is split evenly across two data centers, and that provides us the capabilities to provide active-active service across both data centers.

 

Choosing DataStax Enterprise 

Partly it’s the warm fuzzy feeling of having a vendor that’s standing behind you. We had limited experience with NoSQL, and Cassandra specifically, so having a vendor with proven experience providing enterprise-level software and support to its customers was very attractive to us. The availability of things like support, training, OpsCenter and the value of additional features in Datastax Enterprise were compelling to us as well.

I’ll give you a little bit of insight into the sorts of questions that Datastax was able to answer that we weren’t able to get from the open-source community. Being an enterprise customer, things like security and operability really weigh heavily on our decision to partner with a vendor. The security features that come with Datastax Enterprise allow us to satisfy the internal security requirements that we would typically have for any of the software that we adopt.

From an operations perspective, OpsCenter gives us the capabilities that you would expect of any enterprise database solution, like the ability to perform backup and retrieval, the ability to apply security restrictions, to monitor in real-time the database, etc. Those are the main drivers.

 

Migrating from relational databases

We are a heavy user of Informix and some of our systems were using Informix exclusively as a persistent source.  There still are features within those applications that continue to require strongly relational data features. Those that don’t require relational features have migrated onto Cassandra. One of the other key drivers for migrating those features was to move away from the explicit schemas. The benefit of moving away from explicit schemas is that it gives us the ability to perform upgrades to our applications without requiring down time using versioned reader and writer patterns.

 

Cost benefits

The infrastructure that we moved from still exists and it’s still used for features that aren’t really suitable for using a distributed database solution for. There’s certainly no cost savings in respect to that, but given the fact that it is an implicit schema, it allows us to reduce the number of people that are involved in evolving that database schema.

Traditionally, it would have involved application development, operations, and our DBA team to collaborate around database changes and changes to schema. Whereas now, it only requires the development team to deliver via an automated, continuous delivery process, which effectively means that they’re able to roll out changes to the implicit schema a lot quicker than they would have been able to in the past.

 

Infrastructure at Paddy Power

We operate multiple data centers in Europe and in Australia.  For regulatory reasons, our infrastructure is predominantly private and like the majority of today’s corporate data centers is heavily virtualized.  As you can imagine, our compute requirements contain large peaks and troughs dictated by the timing of sporting events and we need to provision for those peaks.

 

Words of wisdom

You want to select a use case within your organization that’s low-risk and where the NoSQL solution has proven value in other. This way you’re minimizing your risk and you’re guaranteeing your outcome to some extent because you’re taking a new path from a technology perspective, but it’s a path that’s been trodden by other organizations.

Specifically for Cassandra and Datastax, the ability to process time-series data is something that lots of companies have done in the past, not something that we were very aware of, and that was one of the reasons why we chose this as the first use case for Cassandra.