May 23rd, 2013

By 

“We were also really impressed with the fact that no matter how hard we tried, we couldn’t break Cassandra…it was just indestructible.”

-Carl Yeksigian, Quantitative Strategistat BlueMountain Capital Management

 

 

Carl Yeksigian Quantitative Strategist at BlueMountain Capital Management

 

Hi, everyone this is Matt Pfeil and I’m here with Carl Yeksigian, the quantitative strategist from BlueMountain Capital Management.  Carl, thanks for joining us today. I must say, you might have the most impressive title of anyone we’ve ever done one of these with.

 

So to kick things off, why don’t you tell us a little bit about what you guys do over at BlueMountain Capital?

 

Carl: We’re a hedge fund manager based in New York City.  We do all sorts of trades, mostly in the credit space, and we’ve been utilizing Cassandra for our market data specifically; we’re trying to capture a lot of data from the markets in real time, and use that to inform our trade decisions.

 

Can you talk a little bit more about the data that you’re actually storing? You mentioned that it’s market data but maybe at a slightly lower level; tell us more about what that looks like.

 

Carl: What we are striving for is to capture every tick that comes in from the exchange. Every time a trade happens in the exchange, we want to capture that in our Cassandra system. We also capture a lot of extra data, so we would capture the volume that each trade occurs and things like the prices that we think the equities will move to and where we think that the ratings for certain equities would be for credits.  We’re capturing a lot of the data that we have internally in the system as well.

 

For everyone’s who’s not familiar with Wall Street and the financial world, can you share some insight into how many equity markets exist or how many equities altogether exist and what average volumes look like across the entire market?

 

Carl: Sure.  Right now, we’re not capturing all of the data.  There’s probably about 20 or 30 exchanges in the world that are major exchanges, and then each one of those will have hundreds of equities that are primary equities.  For example, NASDAQ has something on the order of 2,000 equities that are pretty actively traded. The amount of volume that you get from each of those equities can be up to a few million ticks per single da; if you also look at things like equity options then there’s a lot more variety in that space.

 

For example, Apple will have maybe 200 options that trade on the Apple equity; that explodes the number of ticks that you get for a single equity.  You would get a tick on every single one of those options.  Usually, you’ll get many ticks pretty correlated with movements in the underlying.  You would receive many, many ticks in the options for every tick you receive in the underlying equity.

 

That makes sense.  In a nutshell, it sounds like an extremely large amount of things happening at any given point.

 

Carl: During the Flash Crash, there were about 5 million ticks per seconds.  It was a huge volume of data that has to be ingested very quickly into the system.

 

That’s a lot, and that makes sense.  What was your primary motivation for utilizing Cassandra, and what technologies was it evaluated against?

 

Carl: The primary motivation for Cassandra was the fact that it can do pretty well at horizontal scaling for both reads and writes.  We have a large cluster here that runs all the time and is requesting market data all the time.  Our users are not actually end users.  It’s these machines in between, so we needed a system that can really handle a very large amount of read and write volume at the same time, sustained throughout the day; that was the principle problem we were looking at.  The other technologies that we were evaluating were also large-scale distributed databases.

 

Our current system is a bunch of flat piles, which works really well for the use cases that we used to have but now that we’re moving into having much larger volumes of data, we needed something that was much more scalable.

 

Awesome.  What advice would you have for someone who was just getting started with Cassandra?

 

Carl: You really need to focus on the types of queries that you’re going to ask and really focus on the data model that you want to use, so that you can store the data the way that you’re going to want to ask for it later.

 

We had a long period where we were trying a bunch of different data models, and none of them were really working.  We eventually found one that really works for the type of data that we wanted to store.

 

I would highly recommend for anyone who’s getting started, check out the documentation on DataStax.  Also, if you have questions about things like data models, definitely talk to the community.  One of the strengths of an open source project is the community itself, and there’s a lot of people that want to help, whether it’s on IRC or the Apache Cassandra mailing list [found at the bottom of this page], so that is always an area of resource availability. Carl, I really want to thank you for your time today.  Is there anything else you’d like to add that either excites you about something that’s coming up with Cassandra or BlueMountain Capital?

 

Carl: The next release, there’s a lot of features that we think will really help us iron out some of the issues that we have and also provide higher scale, because there’s been a lot of focus on improving the performance in both reads and writes, and that will definitely help us a lot.

 

Awesome.  Thanks again for your time, and for all of our listeners, this is another example of how Cassandra is used in the real world and specifically for financial. Thanks again, Carl.

 

Carl: Thank you.

Vote on Hacker News