Skye Book: Lead Architect at Ultravisual
TL;DR: Ultravisual is a beautiful visual publishing platform that lets you create, curate and share collections of photos and videos.
Ultravisual wanted a way to increase the speed of requests for user feeds and Postgres wasn’t keeping up. Initially Redis was in consideration, but upon further review Cassandra became the “obvious solution” due to its ability to easily add capacity, minimal concern about data partitioning, and overall cost.
Cassandra runs on AWS EC2, configured with the region-aware snitch. They’re using medium sized instances with spinning disks. With the performance strains taken care of, Ultravisual was able to get back to what matters most to them – making the content of Ultravisual a more informative and satisfying experience for the user.
Hello, Planet Cassandra. This is Brady Gentile, community manager at Datastax. Today, we have Skye Book with us. He is lead architect at a startup company called Ultravisual.
We’re really excited to hear about how you’re using Apache Cassandra. Could you give us a quick rundown on what Ultravisual does?
Ultravisual is a beautiful visual publishing platform that lets you create, curate and share collections of photos and videos.
Excellent. So, how does Apache Cassandra fall into the mix of the application that you’re building?
Our data structure is fairly complicated in that, we allow users to interact with and post media to multiple collections and to then reuse that media in new posts. Starting out, we were just using Postgres with tons of joins, it was really great for prototyping but as soon as we really started to use it, even between fifteen and twenty people, it became apparent that it wasn’t going to scale well with everything in an RDBMS. I haven’t been a fan of the NoSQL model in the past but we found a good fit for it storing users feeds, notification lists, and statistics. Our use started at first as a way to increase the speed of requests for user feeds since this is the first thing you see when entering the app. By storing entire posts in columns on the user’s row we were able to avoid most SQL queries and serve the full request straight from Cassandra. With the performance strains taken care of we were able to get back to the work of making the content of the feed a more informative and satisfying experience for the user.
So you kind of touched on this a bit. What was your motivation for using Cassandra? It sounded like you switched from Postgres. Are there any other technologies that you evaluated Cassandra against?
Postgres is still our primary data store and can be used to regenerate everything that’s in Cassandra. Cassandra has allowed us to make our most complicated workloads, like traversing a social graph to find posts and activity relevant to you into very fast lookups. As for other systems, I spent a lot of time looking at Redis thinking that it might be the easier way to go. Once I had settled on our approach to store full data blobs Cassandra became the obvious solution as adding capacity is simple and quick with minimal concern about data partitioning. It doesn’t hurt either that disk storage is super cheap.
Excellent. Did you get a chance to check out the Instagram interview as well? Was that, maybe, a factor in your decision in choosing Cassandra?
Rick Branson, did a long, extremely interesting, presentation at Cassandra Summit this year that I saw on YouTube. There’s another very good one by Netflix talking about how they initially evaluated Cassandra and started to bring it into the fold. Hearing that large successful companies have good experiences is always comforting but I’d say our biggest factor in choosing was how well Cassandra actually solved our problem. As I said, it allowed us to focus more on “what” to show the user rather than “how”.
Would you be able to share some insight into what your deployment looks like right now?
It’s all in EC2, configured with the region-aware snitch. They’re medium sized instances with spinning disks, its very cheap to run and the load is nearly zero. We’ll run out of disk storage before horsepower become an issue.
Interesting. Is there anything in future versions of Apache Cassandra that you’d like to see that would assist in the development of your application?
One thing we’re missing, and I don’t begrudge the project at all, because I know it’s a hard problem with these systems, but, having a centralized incrementer would be huge. There are column incrementers now, but having auto-incremented rows ID’s would be great. There are few cases where a UUID doesn’t solve the issue, a short linker for example, where the ID’s look random but are actually based on an incremental value.
Do you have any experience with the Apache Cassandra community? It sounds like you’ve been involved in the mailing lists. You’ve spoken with Aaron Morton, and learned from Rick Branson. Have you been involved in any meetup groups, or the IRC, or anything like that?
Yeah I’ve posted a few times to the mailing list, Aaron is always helpful. I know there’s at least one user group in New York though I haven’t been able to make it out to a meetup yet. Hopefully soon!
Skye, thank you so much for joining us today. That’s all the questions that I have for you. Is there anything else that you’d like to add before we sign out here?
I just think you guys are doing a great job, its great to see companies building their foundation on open source projects. Your documentation’s been, definitely, a huge source of help to me, so, thanks for that. We’re also hiring backend developers and DevOps people.
If what we’re doing sounds fun please shoot me an email: firstname.lastname@example.org