May 28th, 2013

By 

 

“Without any change, you see a huge increase in transactions in our apps by just moving to dedicated physical hardware.”

-Drew Broadley, Founder at PAPERKUT Paperless Receipts

Drew Broadley Founder at PAPERKUT Paperless Receipts

 

 

Hello Planet Cassandra users. My name is Brady Gentile and I’m here with Drew Broadley, Founder of PAPERKUT Paperless Receipts based out of New Zealand.

 

Drew, thanks for meeting with me today. I’m really excited to hear about your use of Apache Cassandra. To start off, could you tell us a little bit about your company PAPERKUT Paperless Receipts?

 

Drew: PAPERKUT Paperless Receipts is essentially putting digital receipts into banking; in New Zealand, we have a really good use of plastic cards: debit cards, credit cards, EFTPOS. Because of that, we’re able to attach a lot of things to these payment cards.  

 

The service works like this: You go up to your retailer, make a purchase with your plastic card payment (albeit in physical or NFC form). Then, in the background, it gets pulled from the point of sale, matched with data from the payment network, stored in our platform and it shows up on your banks internet banking website.  The data from these receipts can then be used for accounting, expenses, management, and many other things.

 

We’re partnering with Paymark, who is the majority payment network provider in New Zealand. They have stakeholders compromised of the major banks in New Zealand and Australia and because of that, we’re able to partner them to get to the banks, in order to provide these receipts via internet banking websites. It’s a seamless thing.

 

That’s awesome. Could you tell us a little bit more about how you’re using Cassandra?

 

Drew: Large amounts of data were always going to be involved with this industry and that’s why I’ve chosen Cassandra. What we do is we use it as a data store, cache layer and also for distributed counters. Utilising distributed counters and pushing them out as real-time analytics has been a big feature for us.  

 

Being integrated in-flight with the transaction payment network, which does many transactions a second, and telling them that “yes, we’ll be able to keep up” has been really nice.

 

That sounds really convenient and definitely something that I would use. Is the data that you store in Cassandra in a physical datacenter or is it in the cloud?

 

It’s interesting because at the start, to keep costs low, we did it in the cloud; it was really just to keep the overhead lower. As you read more, and you read other use cases like Netflix on Amazon AWS and what not, you find out that you’ve got to try and take over at a certain size instance.

 

Then once you start doing that, to get the required throughput of the disk IO, you find that you end up spending as much as what you could being on bare metal. Initially, in our pilots, we’re using a cloud but we will be moving to bare metal.  Through advice of the DataStax and Cassandra MVPs, we have found you can get really good throughput when you put it on bare metal. You get the requirement and allocation of resource, and there’s no other shared resource being used by someone else that you don’t know about.

 

That’s sort of the future plan, is it’s been good experiencing Cassandra on cloud and also starting to play with it on bare metal has been exciting. Without any change, you see a huge increase in transactions in our apps by just moving to dedicated physical hardware.

 

Very cool. It’s good that you had a good experience with it. You had mentioned that you were talking to Cassandra community members, and that they had given you that advice. Do you have any thoughts on the Cassandra community or any of the key players?

 

Drew: Funny enough, that’s how I got involved with Cassandra. One of the community players and DataStax MVP Aaron Morton had set up a local data-driven related user group and, through that, I got involved and started talking to him. His first pitch was Cassandra, and it was at the first user group meeting. We were originally using some other data stores, but his in-depth knowledge convinced me on certain things. Talking to him more, he gave some great support.

 

On the Apache Cassandra mailing list [found at the bottom of this linked page], he’s really active. He’s sending email all the time. I try and search through emails from him and I end up going through all of the user mailing lists emails before I find my personal emails from him. It’s been really good and that’s the thing that really drew me to Cassandra — the community.

 

Cool, that’s awesome. The community sounds really strong and it’s been able to help you out a lot so that’s great to hear. You had said that there were some original database offerings you were looking at prior to Cassandra. Could you explain a little bit about that?

 

Drew: Sure. Starting off, knee jerk reaction was to start using MySQL. It was sort of the same open source community thing, pretty much a similar vibe but obviously a relational database. In my previous musings of it, I’ve done millions of rows and often had to fight index corruptions. What we’re looking at is millions of transactions a day, straight away. During an internal pilot, I thought was a good idea, getting more and more into the sort of numbers we were working with but we realized it’s just not going to work even if we scaled it, clustered it, whatever else you could do with it. There’s some limitations like row and table locking with certain recommended table types that made us worried.

 

Then we started experimenting with NoSQL and one of the first things we came across, just for a quick experiment, was CouchDB. This is an Apache project as well, we had used Apache web server in the past with many, many years of experience so it was a familiar brand. It was great, we got up-and-running really quickly but found limitations straight away. I loved it at first, It has an HTTP RESTful interface. Being a web developer, I could relate to it really quickly.

 

Limitation came very hard and fast with CouchDB so we moved up to MongoDB. MongoDB was good and it definitely has the feeling of being scalable. You got the idea that you can make a horizontally scalable system without having to worry about a thing; that reassurance of not having to invest a lot of time to get something over the line initially, and then still be able to invest later on to grow it was great.

 

Unfortunately, we hit things like the write locking system and the master/slave system. They sort of advertise it as a masterless system but, digging a bit deeper, you find out that it’s not. That worried me coming back from my MySQL days. The master/slave environment has baggage of having issues with going out of sync and having to juggle auto increment row id’s, and all those struggles you worry about when you’ve got data writing across multiple servers.

 

We dropped MongoDB and went to Riak.  Riak was good. It was the community that brought me to Riak and probably the drive of the product; they started letting the world know that this is what they’re doing and we started seeing it all in news. They did great with their PR work.

 

We got going on Riak in time for our proof of concept with third parties. It was running a bit slow and we started talking to Aaron Morton. He started selling us more on Cassandra, and I was going “oh, I’m sure this will be fine, I really like Riak… It’s simple.”  Then basically, once Cassandra had a stable release of 1.1, the distributed counters becoming a lot more mature and then 1.2 is when I started seeing things like vnodes entering the feature set so then I approached Riak to see what their roadmap feature set priority was in terms of what we needed.

 

We decided to talk to Riak: 

Me: “Look, distributed counters… you’ve got a feature request that I think is two years old. Everyone wants it; what are you doing?”

 

Riak: “Focusing on something else.”

 

Me: “This is just something I really need to have”

 

We went to Aaron Morton, talked further about Cassandra and literally two days later I was running on Cassandra 1.1; prices dropped on all of our overhead and we literally had a performance of all writes and processes going from eight receipts a second, up to 100, and that was changing nothing but the backend to Cassandra.

 

Wow. That’s a great story. It sounds like it took you a little bit of a journey to get to Cassandra but you finally settled on the right offering that fit your use case!

 

Drew: That’s right.

 

Very cool. Is there anything else that you’d like to add about Apache Cassandra or the community?

 

Drew: Just the focus; I really like the big nay-says from people that are like “What about data integrity?”; “What about ACID compliance?”. I really respect DataStax and Cassandra for almost going against that mindset, and focusing on eventual availability and data integrity. It feels like you’ve the reassurance of an original good ol’ relational database system in the NoSQL space.

 

Selling to banks, I have to be able say confidently that the data will be there. It may not be there right now, but it will be here eventually. That’s been a big thing for me, especially when the banking world understands that approach.

 

Very cool. Drew, thank you so much for this interview today and the time that you’ve taken to talk to us about your experiences with Cassandra, and I wish the best of luck to you.

 

Drew: Thank you very much.

 

Thanks.

 

Drew: Cheers.

LinkedIn