Martijn Van Berkum: CTO at GX Software
TL;DR: GX Software is an independent, global software vendor and a market leader for online marketing software.
They store everything in Cassandra for ease of use and ease of development, but the largest dataset in Cassandra by far is the profile database. They needed a different storage engine that can handle a high volume of simultaneous reads/writes in real time, without running into table locking. GX Software needed a solution that can scale up to 1 billion profiles or more.
They use Cassandra on AWS (Amazon Web Services) for production purposes. Customers can choose to host it themselves, although only a small percentage choose that option.
What does GX Software do?
We are an independent, global software vendor and a market leader for online marketing software. We provide a Web Content Management system called XperienCentral and an online, real-time, cross-channel engagement platform called BlueConic. We’ve been in the business for 15 years, active in Europe and in the USA. Our headquarters is in the Netherlands. We have 150+ customers, most of them at the top of the market and our software powers hundreds of high traffic, high profile websites.
How are you using Apache Cassandra?
Our WCM uses a combination of the Java Content Repository implementation Apache Jackrabbit and a relational database. For our web content management system, this has worked out great for us. For BlueConic, we needed a linear, scalable, and extremely large profile database, so we use Cassandra for that. We store everything in Cassandra for ease of use and ease of development, but the largest dataset in Cassandra by far is the profile database.
Could you give us some technical info about your Cassandra use case? (How many nodes, how much stored data, read/write rates)
After two years in the market we currently have 50+ high traffic customers (see http://www.blueconic.com/leading-brands-choose-BlueConic.htm). Each customer has their own keyspace and the data is theirs. Some of them have collected tens of million profiles. All customer profiles accumulated, we have built up 120 million profiles and the number is growing exponentially. Every profile contains properties that can be dynamically added, which leads to more exponential growth. To index and aggregate numbers (for example, calculate the number of profiles that have visited a specific website 3 or more times), we use SOLR. Our architecture is fully multi-tenant capable. We have two clusters right now, one in the USA and one in Europe. Both clusters have 3 nodes and we currently store approximately 260GB per node.
What relational database technologies are you familiar with any why didn’t you use them for this project?
We have extensive knowledge of most of the major relational database like MySQL, MS SQL and Oracle. We use them for our WCM system. In the case of BlueConic, we needed a different storage engine that can handle a high volume of simultaneous reads/writes in real time, without running into table locking. We wanted the system to be able to scale up to 1 billion profiles or more. Furthermore, because BlueConic is being used on so many website concurrently, fail-over and consistent uptimes are very important to us. Cassandra, with its linear scalablity, real-time aspects and its consistency model, proved to be a good fit for that goal.
Did you research other non-SQL solutions as well and if so, why did you choose Apache Cassandra over another solution?
We looked at others, like HBase and MongoDb. However, our requirements for integration with Java, a simple deployment model, storage capable of an extremely high volume of simultaneous reads/writes, linear scalability and no single point of failures led us to Cassandra.
Are you using C* in the cloud or in your own data center?
We use Cassandra on Amazon Web Services for production purposes. Customers can choose to host it themselves, although only a small percentage choose that option. Self hosting is actually quite easy because our whole stack is Java-based, so setting it up is fast and easy and there is no dependency on an external (relational) database.
Anything else that you’d like to add about Cassandra or the Cassandra community?
We find that Cassandra and the Cassandra community are maturing fast. In the beginning, when we started using Cassandra (3 years ago), we had some trouble because of the immaturity of the technology and documentation at that time. These days, there is a lot more knowledge available out there and the technology is much more mature. Great stuff!