December 11th, 2013

By 

 

 

Chris Mendes: CTO at Sirca

Chris Logan: Development Manager at Sirca

Franc Carter: Systems Architect at Sirca

Anthony Grasso: Project Lead at Sirca

Brady Gentile: Community Manager at DataStax

 

TL;DR: Sirca’s reason for existence was first to provide financial market data to universities in order to facilitate financial research, second to formulate public policies, and third to generally facilitate the public good through the use of that data.

 

Apache Cassandra, essentially, is going to take over one of Sirca’s databases; it’s actually a MySQL database, in judiciary.

 

The data they’re putting into Cassandra is end-of-day pricing information and corporate actions information such as mergers, acquisitions, renames, and reference information about different stocks all around the world.  Sirca has around 10,000,000 instruments.

 

Sirca is hosting Cassandra in their own data center; it’s a small, 3-node cluster at the moment. They put ~60GB of RAM in each box and are currently using about 3TB of storage per node. They’d like to add more nodes to that as the data size increases.

 

Hello Planet Cassandra users, this is Brady Gentile, Community Manager at DataStax. Today we have part of the Sirca team here with us, Chris Mendes, CEO of Sirca Tech; Chris Logan, Development Manager; Franc Carter, Systems Architect; and Anthony Grasso, Project Lead. Thank you guys so much for joining us today.

CM:  Fantastic, and thank you.

To kick things off, could you tell us a little about what Sirca does?

CM: Sirca is a company that’s not often heard about in the marketplace– It’s a not-for-profit organization owned by our members, and the members are 37 universities in Australia and New Zealand. Our historical raison d’etre was first to provide financial market data to those universities in order to enable financial research, second to provide data to enable public policy formulation, and third to generally facilitate the public good through the use of that data. Even though we are a not-for-profit, we’re also involved in commercial activity and are the power behind the Thomson Reuters Tick History product.

And how does Apache Cassandra fit into the mix of what you’re doing at Sirca?

AG:  Apache Cassandra, essentially, is going to take over one of our databases; a very large MySQL database for storing metadata about tradable entities like stocks etc.

So Cassandra has replaced MySQL for you? Could you tell us which aspects of your application it has replaced?

AG:  Essentially, we’re using it to store time-series data, it’s going to be one of the components that is used to source information that’s presented to clients. It really is just going to be behind the scenes and doing a fair bit of the data storage for us.

 

CM: It’s probably worth just elaborating a little bit on that: The data we’re putting into that is end-of-day pricing information and corporate actions information such as mergers, acquisitions, renames, and reference information about different stocks all around the world.

 

FC:  We have around 10,000,000 “instruments” which we gather data about in that database.

That sounds like a lot of data that you have processing into Cassandra. Was that part of your motivation for choosing Cassandra and what were the reasons why you thought Cassandra would be a good replacement to MySQL?

FC:  We were running into scalability concerns with MySQL, so we had to scale the database in some way and, after some thinking, we decided to change tactics because we could see that our relational model had some extra overhead that we didn’t need from a business perspective and that some other model could potentially scale much better, and that was what motivated us to change from relational to NoSQL. The reason we picked Cassandra is that we had a testing phase where we looked at all of these other database models out there that were vaguely reasonable.  We did a fair bit of reading, understanding, and assessing our options; Cassandra came out with the most number of ticks in the box. We then did a small proof of concept, which indicated that we made the appropriate choice.

Excellent. What were some of the other databases that went head-to-head with Cassandra?

FC:  I actually looked at pretty much everything you could think of–  a lot of them got ruled out as not conceptually correct, but some of the ones that were in there were Riak, Redis, MongoDB. Then we had a look at using Hadoop, but we needed a database that was low-maintenance. We even looked at Tokyo Cabinet; it was very wide-ranging. Riak and MongoDB were the ones that were left higher up the list.

 

CM:  I think it might be worth adding that we also had done a project using Oracle RAC which was quite disastrous. Primarily because of the amount of effort you need to tweak and fine-tune Oracle.  It reinforced our view that we could only go with a relational database if it was truly necessary and for this case, it’s not.

Would you be able to share some insights into what your deployment looks like?

AG:  Currently, we’re hosting it ourselves in our own data center; it’s just a small, 3-node cluster at the moment. We put ~60GB of RAM in each box and currently we’re using about 3TB of storage per node. We envision that this may go up because there’s still some more data we need to add into the database. Each node is running about 8 cores at the moment, which is sort of the recommended number of cores to run in a Cassandra cluster – any more than that and you don’t get the benefits. We anticipate it will grow in the next year or so. We’d like to add more nodes to that as the data size increases.

Excellent. For future versions of Apache Cassandra, are there any features you’d like to see that would benefit your specific use case?

AG:  One thing that I had an opportunity to investigate a while back was management tools for Apache Cassandra. I do know that DataStax offers OpsCenter, which is actually quite a usable tool for monitoring and managing, but it would also be good to sort of have tools to manage deployments in some way where you can use OpsCenter to do things like rolling restarts, weekly repairs and things like that. It may be good to have some sort of management system that sort of sits beside Cassandra to help out with that. As for specific features of Cassandra, there’s nothing really that comes to mind at this point; it seems to serve our needs quite well.

In the most recently released version of OpsCenter 4.0, a repair service is included.  You recently hosted a meet-up for the Sydney Cassandra user group; we really appreciate your participation  in Cassandra community initiatives, by the way.  What other experience do you have with the Cassandra community, whether it be the physical or virtual?

AG:  I’ll start from the virtual point of view: The activity from the Cassandra community is quite good; they’re very approachable. On the mailing list we’ve posted questions there before, and people responded in less than a day.  It’s a very active community and everyone there is always helpful, which is really good and I think it will only grow from there.

 

From a physical point of view: the first meet-up we went to was hosted by Strmr; there was a lot of people there and they all seemed to be quite interested in what Cassandra was about.  There was generally a good feeling about the product of Cassandra and at our meet-up that we hosted, there was a lot of people who knew a bit more about Cassandra and were generally interested in ways they could use it and then how to deploy it. I think the community is growing and there is certainly a lot of interest in its usage.

 

That’s really great to hear that you’ve had a good experience so far, and I know that Aaron Morton presented down there when you guys had hosted; he’s a great resource on the mailing list, I know he’s very active there.

AG:  Yeah, he is. He’s a really great guy to work with, as well, and he’s a beacon of light when it comes to Cassandra.  Also, the likes of Robert Coli, he’s very helpful, as well. He has a lot of good points on how to use Cassandra and how to sensibly deploy it. When you’ve got guys like that on the mailing list, it makes the deployment and getting support a lot easier.

Guys, I think that’s all the questions that I have for you today; I really appreciate your insights.