Li Gao Director of Engineering at Captora
Captora is a company that builds SaaS platform for inbound marketing acceleration. The suite of products we built are helping marketers to discover leads faster, and more cost efficiently. We have quite a few tools to help marketers to achieve that. If you go to our website, you can see there are tons of tools and the tutorial and even some case studies on our website which can give you some fantastic introduction on what we are doing and what products we offer.
As one of the founding engineers here, my role is driving the innovation within the engineering team, also to scale the backend team and the infrastructure.
Log data with Cassandra
We use Cassandra as one of our main backend data storage systems. It’s mainly used for storing our click stream log data and some output from our Elastic MapReduce job. There are quite a few reasons we’re using Cassandra. From my past experience, Cassandra is very good at write heavy data storage. To give you an idea, everything we’re storing right now is approaching multi-million records a day storing into this database.
Cassandra give us the needed performance and provides a cost efficient way for us to record our vast log data. We also research a few other alternatives, such as MongoDB, a few other relational DB, before we decided on Cassandra.
Coming from a relational background
In my previous positions I had used Cassandra as part of our real-time monitoring platform. I am quite familiar with Cassandra 1.0, but here at Captora, we started using Cassandra 2.0, so it’s very different from 1.0.
Most of our developers comes from a MySQL background. We use CQL 3.0, so we find this very intuitive for some of our developers to learn and to develop programs against it.
At surface CQL 3 looks very similar to MySQL. That is a good thing for our developers as it lowers the bar to adopt across team. However we still need to invest in unlearning relational models and the distributed nature of the schema design in CQL 3. My advice would be that having a public, shared, up to date, technical guide for folks coming from SQL background would be very helpful.
By using CQL 3 an efficient design of the primary key and the index is extremely important for efficient usage of the system. I would advise whoever coming from a MySQL background, study and experiment with different designs of primary keys and the secondary index before commit a final schema.