September 3rd, 2013

By 

 

Dave Daeschler: Software Architect at InWorldz

Matt Pfeil: Co-Founder at DataStax

 

TL;DR: InWorldz is a massive-scale 3D virtual world.  It’s a very open-ended architecture where people can come in and sort of create their own games, create their own content, and sell virtual goods.

 

Cassandra right now is deployed and in-use for our inventory cluster.  Cassandra was chosen because they liked the fact that for their particular use case, it guaranteed that their inventory would always be available and accessible.  They are curently running Cassandra 1.0.12

 

InWorldz currently uses 4 nodes sitting on real hardware with 16GB of RAM.  It’s on spinning disks, SAS 15K drives and the data load is right around 100GB per node.  Everything is running on one-gigabit Ethernet. They’re seeing somewhere around 500 reads per second and 200 writes per second.

 

Hello Planet Cassandra, this is Matt Pfeil.  Today I’m joined by Dave Daeschler, Software Architect at InWorldz.  Dave, thanks for taking some time with us today.  To start things off, why don’t you share with us a little bit about what InWorldz does?

InWorldz is a massive-scale 3D virtual world.  It’s a very open-ended architecture where people can come in and sort of create their own games, create their own content, and sell virtual goods.  We actually provide 3D spaces for people to use and they can come in and sort of use them in any way that they want. Many people who use InWorldz want to create a game for other people to play or want to do role-play or maybe they’re just interested in sort of hanging out and having a little private space of their own.

 

We also have companies that use our product.  For example we have a customer, DeMolay International, that’s currently piloting a program that teaches kids life skills, through the use of a 3D virtual world.  It’s a very interesting product and due to the complex nature of the architecture, we’re pretty much every hosts’ worst nightmare and every database’s worst nightmare; we have just about everything you could possibly not want: high CPU demand, high IO demand, high network demand, and that data all needs to be served up no matter how much it is.  Even if it’s 25,000 objects sitting on somebody’s virtual land, they want that to come up within seconds, not minutes.  We’re always pushing the boundaries on everything.

 

That’s very cool.  What’s the use case for Cassandra?

Cassandra right now is deployed and in-use for our inventory cluster.  So when you’re inside this virtual world and you want to purchase something, those objects have to go somewhere and it has to be persistent; we can’t lose your data because when you spend money on a virtual good, it’s just as important to you as a physical good.  

 

Cassandra was chosen because we liked the fact that for our particular use case, we can guarantee that their inventory would always be available and accessible. I can guarantee that our cluster would be able to be scaled out because it’s going to do nothing but grow.  People don’t throw things out, right?  

 

They buy stuff and they keep it; they’re not going to toss it out.  This is a forever accumulating data set that will continue to grow..  When I took a look at Cassandra, one of the really nice things about it was that the clusters could be easily grown.  There was a standard way of doing it and this was found right in the documentation.  The write performance is most important because as more people log in and use the cluster, the more we’re going to want to make sure that every transaction is recorded.  You don’t want to buy something and then have it not end up in your inventory because that means you’ve just wasted money.  

 

We needed something that was very high read, as well.  While I was looking through all the different options, Cassandra seemed to be the most mature.  At the time we had a failing MySQL infrastructure that just wasn’t able to keep up with both our write and read load.  Once we turned on Cassandra, it has been really great; we’ve had server outages with no impact on customers, and it’s been exactly what it has promised thus far.

 

That’s great to hear.  What version did you start with?

In development, I started with 0.8; luckily, by the time we got ready for production, I was able to test 1.0.6. Now we’re on 1.0.12.  We’re still on 1.0 and right now; I’m in “If it ain’t broke, don’t fix it” mode.

 

That’s a very wise antic to take with things like databases, so that makes a lot of sense.  What can you share about the size of your deployment?

Our current production cluster is almost ready to be expanded, probably in the next few months.  As of right now, it’s 4 nodes sitting on real hardware with 16GB of RAM.  It’s on spinning disks, SAS 15K drives and the data load is right around 100GB per node.  Everything is running on one-gigabit Ethernet.  It’s all physical hardware that it’s sitting on right now.

 

That’s very cool.  What are you access rates like?

If I had to look at write and read speeds, we’re seeing somewhere around 500 reads per second and somewhere around 200 writes per second.

 

Very cool.  What’s the number one feature about Cassandra that’s really worked out well in your use case?

I think that would have to be ease of maintainability. There are a set number of steps you have to do; you have to make sure you have prepare running before GC grace and other things like that.  For the most part, I know that when I throw up data into the Cassandra cluster, it’s going to be there and it’s going to be available.  

 

Considering that I’m the primary guy to go to for the complex systems, I don’t like to get calls in the middle of the night because something isn’t working.  For us, it’s super important that the system just maintain itself for the most part and be very low maintenance. So far, Cassandra has been great with that.  

 

We’ve had hardware go down and, besides for the alerts that we got from Nagios, we didn’t have to worry about it too much until we got the next box up.  To me right now, that’s super important in the immediate term.  In the long term, the important part is that we’re not going to bottleneck the infrastructure and I’m going to be stuck with something that doesn’t work and have to completely redesign the app later on.  That was a huge consideration when initially starting with Cassandra; I wanted something that we could stick with, that would grow with the company and if we see an exponential increase, we could just supply the corresponding hardware and know that the cluster is going to behave properly.

 

Dave, I want to say thank you very much for your time today.  Is there anything else you would like to talk about before we sign off here?

We’re going to be expanding this cluster soon; it’s going from 4 to 8 machines.  We also have another use case for Cassandra, which we’ll be implemented soon.  With the information you gave me about DataStax Enterprise having integration with Solr, I plan on possibly utilizing that solution; it would be for something I can’t go into detail about right now but it’s going to be a really cool feature that will allow people to find, sell and purchase items in InWorldz.