July 7th, 2013

“With Cassandra, we were able to meet the customers’ requirements of high-speed ingestion, real-time querying, geographical data distribution with near-real-time data replication, giving them choice of technologies and exceeding their expectations working with open source options.”

Yue Cathy Chang Sr. Director of Business Development at Impetus

Aashu Mahajan Senior Technical Architect at Impetus

David Leimbrock

Yue Cathy Chang Sr. Director of Business Development at Impetus



Cathy, can you tell us about what Impetus does?

Cathy: Certainly. Impetus is a software solutions and services company. With over 1400+ engineers, we’ve been architecting and implementing big data since 2007. In addition to big data, our technology domains also include carrier grade large systems, test and performance engineering and enterprise mobility. With customers spanning financial services, healthcare, manufacturing, telco, digital media and more, we are considered pioneers in distributed software engineering with vertical and functional expertise.


How does Impetus incorporate Apache Cassandra into its mix of services?

Cathy: Apache Cassandra is one of many technologies Impetus brings to customers. We pride ourselves by standing behind each objective recommendation we give to customers, as our innovation lab tests each technology of interest in actual environments so we can ensure customer implementation success. An example would be Impetus designing and implementing for a large telecom company a highly-scalable distributed architecture. With Cassandra, we were able to meet the customers’ requirements of high-speed ingestion, real-time querying, geographical data distribution with near-real-time data replication, giving them choice of technologies and exceeding their expectations working with open source options.


Being in the consulting industry, I can imagine you’re always looking to capitalize on supporting technologies that are gaining popularity in the mainstream. What do you foresee for the future of Apache Cassandra and other NoSQL solutions?

Aashu: What NoSQL and other Big data technologies have done is they opened the flood gates to processing data at a scale that had been not possible earlier. The industry is getting to a stage where we are seeing more companies moving from evaluating to really pushing major implementations in Production. We believe it will continue even at a faster pace. The NoSQL solutions provide what could not be done using conventional means. We understand better from past few years what works well with NoSQL and what does not. The major NoSQL vendors have been coming up with the advances which make them integrate better with the other major enterprise components that are part of the architecture like search.


People view the adoption of database solutions that have been been around for a while, such as Oracle, to be ‘safe’. What advice would you give someone who is considering the adoption of a NoSQL solution, such as Cassandra, but is worried about the future of these relatively newer solutions?

Aashu: I definitley believe they need not worry about the “future”. These solution are here to stay. What is more important is “Are they making the right technology choice”. Each major NoSQL technology has its own strong/weak points and picking the “right” one that fits the requirements should be the major consideration. I always tell my clients that NoSQL is not the silver bullet. It can be a crucial part of their architecture and their focus should be to ensure that the architecture satisfies the solution.


I’ve heard that one of your engineers has written a book on Apache Cassandra; could you tell us a little about this book, where to find it and if there will be updates as newer version of Cassandra are released?

 Cathy: Yes, Vivek Mishra wrote “Instant Apache Cassandra developers”, a quick reference guide that was published by “Packt publishing”. It is available at http://www.packtpub.com/apache-cassandra-for-developers/book. Per Vivek, this book will walk you through:


1. Install and configure Cassandra

2. Cassandra’s internal and storage architecture to create scalable big data solutions

3. Manage and configure Cassandra with the Cassandra Query Language

4. Scale and optimize Cassandra

5. Explore Secondary indexes and composite columns

6. Discover Cassandra¹s Java APIs


The book was released on March 30th, 2013, so it’s recent. We’ll have to get back to you on version upgrade plans.


What kind of mistakes do you commonly see out in the field and what advice do you have for someone who is trying to put together a big data solution from scratch?

Aashu: The most common is simply trying to validate the performance but not looking at the complete picture. These systems scale well for most, but little consideration is given to understand what it will take to go from a POC to an actual Production implementation. My suggestion is to give serious consideration for the operations/management and assess that aspect as well during the evaluation. The other major mistake is making a decision by applying very small part of problem statement. One needs to understand the constraints better and this could be done by defining the right evaluation process.


What are your thoughts on the Apache Cassandra community, whether it be virtual or physical?

Cathy: The Apache Cassandra community provides an excellent forum where developers get support and accelerate innovation with projects and code evolving through community cooperation. We look forward to collaborate with the individuals and companies that make up the community.


Anything else that you’d like to add?

Cathy: I’d like to invite the audience to view Data as Competitive Advantage in Manufacturing by Rich Hammel, Brocade and Vivek Ganesan, Impetus Technologies  (Video), it showcases how big data is changing manufacturing, the essential ingredients for success in greenfield big data projects, and what it’s like to be obsessed with quality. Thanks!