The majority of Postcode Anywhere business is about address capture; helping people enter addresses very quickly, and easily on websites and call centers. That’s the core of what we do, the use case for Cassandra is actually a spin-off project from one of our in-house technologies which we’ve used to manage our customers. We’ve got a fair number of customers who are on a self-serve scheme. To try and give them the best possible experience, we try to track what goes on with them and almost try and infer their behaviors to figure out where they are and to see if they might have any problems, or if we need to help them.
It was just over 12 months ago when we realised that we couldn’t progress with SQL. There’s no way that we would be able to get the scale out of it that we wanted.
The system that we used to do that initially worked fine internally. But we decided that we wanted to try and commercialise it, push it out to a much bigger audience and obviously SQL server doesn’t scale either from a technology point of view, or a cost point of view, for anything beyond relatively trivial amounts of data these days. That’s really where we ended up, and started to look at NoSQL.
Cassandra was part of a whole range of different ones that we investigated. But initially, we were very much looking at MongoDB and found that to come unstuck very swiftly.
While the development interface for MongoDB is very pretty and very nice, under the hood, it’s really not what we wanted. We did a lot of performance testing between MongoDB and Cassandra not only from straight on query testing and those sorts of things, but then also understanding how it worked in failure scenarios.
Architecturally, I didn’t like MongoDB. The way you have to set it up to have lots of different nodes is a real mess, very error-prone, and very manual, but in the way that it manages memory and that kind of stuff just looks incredibly messy. Whereas Cassandra seems to have a right balance of it not feeling too alien, although obviously under the hood it’s very, very different to what we are used to, but it gives us a relatively SQL sort of syntax with CQL. In terms of management and getting it up and running, data operations, it’s just been an absolute breeze, in truth.
Make sure that when you’re evaluating options you’ve got the same consistency levels between the tests. A lot of where you’ve seen these comparisons of one database to another, they aren’t typically real comparisons. You’ve got one database where you’ve got guaranteed durability and somewhere you’ve got no durability at all or all guaranteed. Certainly in the case of some of the ones we tested as you started to fail nodes, you would almost always lose data. We couldn’t get Cassandra to lose data without telling us it was unable to save it.
In other words, there was no situation where things weren’t written away without us knowing that we would have to retry it. Even then, the instance of that happening was so minuscule with the volume that we were playing with that we didn’t even have to think about it. Whereas, with other providers it was very, very different. You just haven’t got a certainty that stuff will be saved away when you want it to.
What’s been really nice with Cassandra is that it is obviously very, very different. I guess because it’s so clear about which nodes the data ends up on and how you you sort things, as long as you make the design decisions right and you kind of know how you’re going to do your searching and data model, the migration isn’t complicated at all. Because the query syntax is pretty and very SQL-like. You don’t really have to think too much about it as long as you’ve got an awareness of where your data is in the cluster.
We’ve had guys who, I suppose at the very early stages got a little frustrated that they can’t do just WHERE clauses on anything and then start putting through, and relaxing all the filters. You have to be quite rigorous and say, “no, you can’t do that”. It doesn’t matter if you’ve got trivial amounts of data and you’ve got trivial numbers of nodes. If you’ve got a couple hundred nodes in play then you’ll get yourself in a terrible mess in no time at all.
You just have to be more conscious of things, but in a way that’s really good because you’ve got such greater control over the consistency and durability, you can make those decisions. If you have a bit of data which you just want to chuck away, if you lose one or two of them it doesn’t really matter, you can put those off with consistencies that are really, really low, and then stuff that you’ve got to be really, really careful with you can just tune up. It’s just nice having that real clarity in the operation.
Bringing Cassandra to multiple availability zones across data centers was really easy. Our initial attempt at using it was spun up in AWS, so we split it up into two zones or regions. Over in Ireland, which is obviously sort of those closest Europeans so I wanted them, then stuck something over on the West region as well, just to see how well the replication worked. Again, that was just super easy to try and get everything going.
Ironically, where we still continue to use SQL for the main core of our product we’ve had to write our own replication infrastructure because the stuff that was in SQL wasn’t smart enough to deal with long latencies and that kind of stuff.
The eventual consistency model that we got in Cassandra just makes everything so much easier to understand. You’ve got a nice combination of clarity and transparency with the operations. You’ve got all of those choices that you could make in terms of what you want. When you want the performance you’ve got a very visible trade off if you want to really, really start ruffling things up, or you could just start adding nodes. The other solutions we looked at just hadn’t got that clarity with, which is always going to be a problem.
Consistency & tuning
We’ve been exposed to eventual consistency a bit anyway, because of the way that we build our own replication system with SQL. That was always designed to work toward eventual consistency. There’s nothing actually that’s vastly different than we’re used to. The big thing is being really aware of it, and being really aware of ultimately how your data is designed to make sure that you’re working with the technology rather than against it. That means that sometimes you’ve go to, coming from a SQL background, you always have to do everything the other way around.
Whereas you’re taught quite rigorously try and normalize everything. You know, massively de-normalize, and you typically have a couple of copies of stuff here and there for different use cases. Which, ironically, we started to have to do anyway with SQL just to try and really rinse the best bit of performance out of things.
Again, there’s nothing that’s particularly different or scary about it, it’s really just forcing you to have that really early stage thought of how it’s all going to work. Because if you don’t, if you just chuck stuff into your scheme and you would have done with SQL and then just think, “that’s fine, I can throw in indexes all over the place, move stuff around,” then you will get in a terrible mess.
We’ve got two bits of data that we store away, the stuff that’s really important – typically billing-type data – then logs and situational awareness data. The counting and billing stuff, there’s a lot less of it, but you want to make sure it’s there, so you tune the consistency up to like quorum level. The other stuff, just keep it really light so that as long as one of the nodes acknowledges it then you can move on. Having that flexibility makes life so much easier. You can really get some crazy performance out of it as soon as you start pushing stuff through.
Cassandra updates and monitoring
We try and keep on top of the versions at the moment because we’re running with this migration from the sort of true R&D stage into getting ready for launching to our beta group. That’s kind of making it pretty interesting for us, but it also means that we’re trying to lock things down and make sure that we’ve got everything tied up.
It has been no problems at all to update. Just a nice, simple rolling update that went through without any problems, we’ve done quite a few. We’ve got a dev cluster in-house. That cluster’s small, it’s only got five nodes in it, but just to prove that all the data operations is fine, and that it works. We’ve force failed large elements of it and it comes up absolutely fine. Again, you get that confidence that it’s going to do what you expect it to, it’s much nicer. We’re starting to use Spark with Cassandra as well, which is all kind of very new and a little bit scary. The potential there is very, very cool indeed.
OpsCenter is awesome. Looking at what we’ve got with this sort of Microsoft/SQL server, you don’t get anything vaguely as useful as it. The fact that it’s there out of the can is just fantastic. It’s really good at telling you where you’ve got problems and while that’s very rare, but it’s just really, really simple to do the big cluster-wide operations which otherwise would be really time consuming. Especially, we’re Windows guys, we’re not Linux guys so there’s a whole lot of learning there to try and get our heads around the whole Linux world, too. The whole thing seems to just fit together very nicely.
There’s an awesome community around Cassandra. When you do get stuck, there’s loads and loads of support there to help you out, and to try and understand where there are challenges to getting your head around modeling things differently. You’re not on your own, which you can be with some of the others as well.
There’s so much stuff on Planet Cassandra, loads of super helpful stuff; the documentation is really good, way better than most things that we’ve seen. The whole thing feels more mature, both the product but also the environment around it to make it much, much easier to try and get everyone together and obviously you’ve got fantastic monitoring tools as well so that you know exactly what’s happening with your customers; it makes it really easy to deploy it. All of these sort of things just go into making it not a scary step up.
Ultimately Cassandra gives us a much stronger platform for growth. We know that we can start adding nodes left, right, and center if we need to without worrying much about it. We know that the performance of it is going to maintain the levels that we’re expecting. We know we won’t see any nastiness there as we start to scale, which is really important, too.
I think one of the great things about it is that there’s not a lot of fuss, it’s boring. There’s nothing about it that’s too scary or too techy or anything like this. It’s packaged incredibly well and just works, and it works reliably and scales sensibly. It’s not at all what you typically would expect from a lot of what’s out on the market, which is all super bleeding edge and really patchy and flaky. We haven’t had any bad experiences with it at all. We’ve been really, really happy.
We were reading a article with one of the head designers of Concorde, the supersonic plane. One of the complaints he got from some passengers who complained that there was no noise that there was no fuss, and questioned that they were supersonic. The lead designer turned around and was like, “yeah, that was the really hard part of the design.” Good design should be boring. You can actually get on and do your stuff that is valuable. You don’t want to have to be battling with the technology that needs to support you so you can carry on and do those things which it enables and the value that it can help us generate.