Cvent builds a SAAS platform for event planners and management. So if you’re planning a conference and need to send out marketing materials, setup agendas, pick hotel venues, send out emails, setup a registration website, send out surveys then Cvent is what you’d want to use.
My role here as the enterprise architect is to enhance our architecture so that we can continue to scale our products for the next 10-20 years.
We currently use Apache Cassandra in a variety of ways.
- We use it as part of DataStax Enterprise SOLR offering to help horizontally scale some of our search needs that allow customers to perform free text search through RFPs.
- We use it as a key/value store to offload most of the work from our OLTP (SQL Server) database in support of our Survey product. We make heavy use of secondary indexes as well with this use case.
- We use it as a datastore for our in-house analytics engine where we dump all of our analytical “counter” data. We then use spark streams to perform aggregations of this data and store the results back for further reporting.
In all 3 of our use cases we’re using DataStax Enterprise.
The main reason we needed Cassandra was for scalability. We wanted something that could handle very high volume traffic with large spikes and at the same time allow us to add more nodes over time if we needed to scale out over time. Many of our use cases as a SAAS event management product have very large high volume spikes while most of the day has only moderate volume. The other nice thing about Cassandra was the secondary indexes. This comes in handy for our second use case where we’re storing a large JSON blob, but then needed to find the row key via a secondary index. This effectively turns our Cassandra cluster into a document database and it’s worked out nicely.
The evaluation process involved both prior experience and future needs for our business. We’ve evaluated the following databases: MemSQL, NuoDB, Redis, Memcache, Membase, MongoDB, Couchbase, Postgres, MySQL.
We’ve also briefly looked in VoltDB, Riak, FoundationDB, and Aerospike. Ultimately, many of these databases could have solved our use cases with varying degrees of success. We choose Cassandra primarily because of it’s ease of setup, good documentation, price and support from DataStax. I wanted to choose a technology that had good commercial support and was easy to run operationally.
We’re currently using a 3 node DataStax Enterprise cluster running version 3.1.4 and another 3 node cluster is being evaluated for our analytics that’s running DataStax enterprise 4.5.1. We currently are not running a multi-data center deployment since most of our data can be replicated from other data sources in the event of a catastrophic failure.
The performance of Cassandra has been excellent. We have spikes up to 5000 read/write per second without issues in production. During performance testing and evaluation we were easily able to get up to 20,000 per second. The 2nd thing that was really nice was the ease of setup “yum install dse” and then there’s lots of nice blogs and documentation around properly tuning your production cluster.
The best way to get started is to just download the software, install it and start playing with the demos. I highly recommend using something like Vagrant to allow you to quickly iterate on running a local cluster. Once you get it all running then I like to start playing around with the SDK client apis that interface with the database because this gives me a good feel of how it will work for real scenarios. I can also quickly write up some quick performance tests and iterate as I figure out how the technology works.
When you’re evaluating NoSQL technology you must make sure to understand both your read use cases and your write use cases. With many NoSQL technologies it’s easy to scale out write volume, but then the performance of reads suffers a lot unless you use the primary key. With the Cassandra secondary indexes we get both the flexibility of adding them later as well as the ability to query for our data like a document database. The performance of doing this has allowed us to have great performance on the read and the write side of our queries.
The community behind Cassandra has been great. The Cassandra mailing list is very active and everyone is very helpful. I’ve also gotten lots of use out of the DataStax forums and support channels, Netflix blogs, eBay blogs, etc. There’s tons of useful articles out there on Cassandra.