Apache Cassandra at LoopLogic
LoopLogic allows businesses, big and small, to cost-effectively and securely mix and share online presentations that feature rich media elements. With LoopLogic, users can record audio and video over slides; sync a video and slides into one presentation; add “chapters” to a video presentation; or upload a slide deck with new video or audio. They also can measure the performance of the content stored in their “channels” by tracking statistics— including who viewed their presentation, when and where they saw it, how long they watched it, and if they are sharing it.
Stephane Legay, LoopLogic’s chief technology officer, is co-founder of the Phoenix, Arizona-based startup. At the end of 2011, LoopLogic was still in the beta phase of development, but was preparing to go live in early 2012. The site had about 1,000 users at the end of 2011, mostly in North America, but also in Europe and South America. Viewers, meanwhile, span the globe. Legay says most of LoopLogic’s initial users are marketers—from individual business owners to representatives of large companies, such as Microsoft. Some organizations, including the nonprofit Open Compliance and Ethics Group (OCEG), are using LoopLogic to support their staff and client training initiatives.
Once LoopLogic officially launches in 2012, businesses of all sizes, everywhere, will have the opportunity to take advantage of advanced analytics services. “When users put their rich media into our system, we will be able to provide deep analytics and integration with their backend systems, such as their mass email services and customer relationship management systems,” explains Legay. “We can then send out messages that will help them to create and nurture leads.” He adds that this type of offering requires the support of Apache Cassandra—“a robust, real-time analytics platform.”
LoopLogic initially relied on a MySQL database for analytics, but Legay soon realized the solution could not meet performance expectations—especially in terms of write throughput. “With MySQL, there was a delay between a call in to write and the time when the read would be available,” he says. “Obviously, this was going to be a problem, as we need to do a lot of real-time writes. For example, one entry of someone watching a video might create 1,000 different writes. And we want to be able to create reports quickly across various “dimensions”— including geography, traffic sources and embed sites.”
In the summer of 2010, after pushing the MySQL solution to its limits, Legay decided it was time to explore NoSQL database options for LoopLogic. After quickly narrowing down choices to HBase (the Hadoop database) and Apache Cassandra, Legay decided to implement Cassandra—partly, he says, because HBase would be too difficult to set up. However, he says he also wanted a solution that would scale to terabytes of information, and that Cassandra more than fit the bill.
“When I first looked at Cassandra, I was impressed with how it can replicate and ‘magically’ scale across a cluster,” Legay explains. “I also liked that we could add nodes without worrying about configuration. When we actually started using Cassandra, we saw our write throughput (on a simple, one-core box used for testing) leap from 60 writes per second with MySQL to 5,000. We’ve generated about 75 GB of statistics data over the past year and it’s only growing— and we’re really happy about that.”
More Time for Innovation
The fact that Cassandra relieves LoopLogic’s IT resources of the time-consuming task of sharding was also a key factor in the decision to implement the solution. “Before Cassandra, we had to manually shard data through our relational databases and spin up new databases. It was becoming a maintenance nightmare,” Legay says. “Our users’ channels are repositories of content, and we needed to shard per channel. Each channel was allocated to a database; as the channels grew, we needed to spin up new databases. We also had to have a master database to map what channel went to what database. So if a few channels would suddenly burst to millions of views, that database would be kept very busy.” Cassandra’s built-in sharding, and the fact that the database provides no single point of failure, are features that together have made a significant difference in how Legay spends his work hours at LoopLogic.
“I can now focus more on innovation instead of server maintenance,” he says. “Since implementing Cassandra, I spend barely an hour a week examining servers or repairing nodes. I just don’t have to worry about those things. Anytime I need new data, all I have to do is increase disk space or add a new node. Cassandra has reduced my maintenance time by about 10 times.” (Legay says he also has been pleased at how well Cassandra has been mapping with the Amazon EC2 stack.)
Best of Both Worlds
Apache Cassandra is an open-source solution, which Legay says is appealing to a startup like LoopLogic that cannot anticipate the trajectory of its post-launch its growth. “However, staying with a solution that is open-source would be pretty dangerous for our company if we didn’t have expert support to lean on,” says Legay. “With Apache Cassandra, it’s like having the best of both worlds: We enjoy the low cost of entry, but also, the support we get from the community.”
Legay adds that DataStax’s Cassandra support services have been important in helping him to quickly resolve issues during implementation—such as when he’s encountered corrupt data. “We’ve been using DataStax’s support services for about a year and have been very impressed with their responsiveness. We’re now thinking about how the DataStax OpsCenter might be useful a tool for us as we grow.”
Legay says he has yet to find another “plug-n-play” solution that can deliver everything Apache Cassandra does: “When you look at the database world, and you want to meet a specific need or find a cloud-based solution that will let your data scale out indefinitely, Cassandra is pretty much the only real option,” he says. “With Cassandra, we have confidence that everything will just work magically.”