October 2nd, 2013

Andrew Bruno: General Manager of Operations at LeaseEagle

Brady Gentile: Community Manager at DataStax


Hello Planet Cassandra users.  Today we have Andrew Bruno, General Manager Operations at LeaseEagle joining us.  Andrew, thank you so much for joining us and we’re really excited to hear about how you’re using Apache Cassandra at LeaseEagle.  To start things off, would you be able to tell us a little bit about what LeaseEagle does?

Sure Brady, thanks for inviting us along.  LeaseEagle is a cloud based business solution for any organization that needs to manage the lifecycle of physical locations.  Typically these are retail and hospitality organizations throughout the country or across the world including some corporate offices that have many sites. 


So, for example, you may have a location in San Mateo where you are, Boston, and one in LA.  Let’s say you’re a franchise café with thousands of locations throughout the world, and you need to manage those locations from the head office in terms of landlord lease management.  How much rent are you paying per month, what is the CPI increases?  What are the franchise agreements that you have?  What are the critical dates that you need to be aware of?


On top of that we have things like management of CPI increases and currency differences across different countries.  The sales data for the location is managed within LeaseEagle also. Where LeaseEagle starts to really show its power is when you start to run some business intelligence.  When you start to compare the performance of one location against the performance of another location, and this is where Cassandra comes into it.


That’s excellent.  I can imagine there is a lot of data out there about different lease agreements for large companies.  I’m actually imagining Starbucks;  there’s a Starbucks on every corner, so I bet it would be beneficial for HQ to have a program in place, such as LeaseEagle, to manage the leases for those locations. 


How are you using Apache Cassandra at LeaseEagle?

Well I mentioned business intelligence before, one of our strengths since day one has been the ability to offer lease administrators, all the way to the CEO, the flexibility and the ability to be in control.  To be able to access their data wherever they are whenever they want.  Now think about a database of hundreds of tables highly normalized following good database practices, how could we do this?


We started in 2005 and back then views was the approach we took.  So we started to create database views that took the data that we needed across many databases into one and then we could query that view.  So we built a UI, think about a UI full of check boxes, of the different fields that anyone could select, calculated like sum, average, totals, and some filter capabilities, and you could run that against the views, and you would get your data. 


It worked fine in the early days, but as LeaseEagle started to grow and customers wanted more and more functionality and features, we started to hit a bit of a brick wall.  Soon we realized, “Hey we need to redo this.  We need to rethink how we’re going to offer this continuous flexibility”.  That was one of our selling points.  You don’t need a DBA to access your data.  You are in control of your data.  You can run any report that you want, whenever you want. 


We started to see the NoSQL world, call it religion almost, or theory start to come about, so we looked at NoSQL seriously. We started to see and hear some really, really good stories about scalability, redundancy, the performance, and that’s when we came across Cassandra.  So, in 2010 we first saw Cassandra.  We looked into it and we thought “wow, this is awesome”.  We kicked off a R&D project and started experimenting with Cassandra; whatever we gave it, it performed well.  We gave it thousands and thousands of rows, millions and millions of rows, kept throwing data at it and it just kept behaving really well.  We could add more and more nodes and its write capabilities are excellent and its read capabilities are also excellent.  So, that’s when we flicked the switch to Cassandra, and that was one of the reasons why.


That’s excellent.  Was there a specific reason why you picked Cassandra over other technologies?  Did you evaluate it against any other databases at the time?

There was MongoDB and Cassandra.  Cassandra shined instantly, because it was written in Java, and we are a Java house.  The community was strong, the updates were frequent, and the documentation was really clear, which was important for us.  So, we were able to learn Cassandra very, very quickly.  That’s a big point for us.  We’re a small engineering team so the ability for the engineering team was just grabbing you this new framework, and within a week have something running, and having a prototype was a big selling point for us.


That’s excellent.  It sounds like you’ve had a really good experience getting started with Cassandra.  Your transition; I know you were saying you were a SQL-shop before is that right?

We’re still very much a SQL-shop.  We are running both.  So, SQL is still our asset repository.  The database hasn’t changed.  We still store all our data in the database.  What we used Cassandra for was reports.  So, we have a task engine framework, and basically whenever any data is saved within LeaseEagle, it is still saved to the database.  Then it spawns off these jobs that update Cassandra in the background.  We did this so we could easily implement Cassandra, and gradually expand on it.  So, our existing database was rock solid, it still maintained all the data, and what we needed was the ability to offer really fast reporting mechanisms, and we couldn’t do that with the database.  We started to push out jobs of data to Cassandra’s schema. 


We have four schemas that are now populated continuously.  The beautiful thing about that was that we were able to roll out Cassandra and then run a whole bunch of scripts that populated Cassandra schema.  So, our users straight away had access to Cassandra but the UI, all our code base in terms of saving the user data, saving all the data against the database following the MVC etc patterns and all that was exactly the same.  All we did was add these monitors on the save action that would create these jobs in the background. 


Would you be able to share with us some insight on what your deployment looks like?

At the moment we have two data centers and we have a host in one data center, which is running a couple of VMware virtual machines and using SSDs, and it has two Cassandra nodes and a host.  Then we have another host on another center that’s running three Cassandra nodes.


Are you by chance utilizing multi data center replication?

We’re not actually at the moment.  We want to look at that, and that’s actually the next project that we’re going to be kicking off.  In the early days, we took a fairly conservative approach to Cassandra, where we still set the database as our primary data, and Cassandra’s our secondary.  But now, as we’re growing more, and more and more, we want to start to use Cassandra to actually store some primary data, and to do that we want to make sure that our redundancy across data centers is actually in place.


Excellent.  For future versions of Apache Cassandra are there any things that you’d want to see that would be beneficial to how you’re using it?

I’ve got to say the community has done an excellent job so far.  The documentation has been fantastic.  I want to make sure that future versions continue to be backwards compatible within existing data sets to avoid any reinvesting issues.  You’ve kept a pretty good track record on that but I want to reinforce that’s in place.  It’s important to keep focusing on performance and scalability as well. 


I should say one thing that may be beneficial is the ability to have subsets of data. We’ve had to re-engineer a little bit of our schema so we can have subsets of data within fields, and that works fine because we’ve built around it.  So adding something around subsetting data and also more filtering capability so you can do a little bit more of that ‘where clause’ concept that we do in databases would be helpful. 


Very good.  Then you had mentioned that the Apache Cassandra community, the experience you’ve had with them so far has been very responsive.  Do you have any other details about your experience with that community?

It’s just been a very positive experience so far.  I know that the engineers have put questions on the forums and they’ve had a really good response back.  It’s just a really great community in terms of supporting peers in a similar industry.  So, well done.