November 22nd, 2013

“We needed something that scaled up and also allowed us to

write more information and control that information.”

-Ivo Jesus, Tech Lead at at Portugal Telecom

Paula Ferreira Manager at Portugal Telecom

Fábio Costa Developer at Portugal Telecom

Ivo Jesus Tech Lead at Portugal Telecom


TL;DR:  Portugal Telecom is a global telecommunications operator is the national leader in all sectors in which it operates.


Every time someone calls their call center, logs into their website, or goes into a physical store, Portugal Telecom receive data from these events. They are using Cassandra to store all of these events and also to store the metrics that they gather from these events.


For this current project, Portugal Telecom has one data center and they’re using 2 Cassandra nodes. The amount of data they are storing right now is around 5GB, but they have over 1.5 billion keys and receive more than 1 million events per day.


Hi everyone this is Matt Pfeil and today I’m joined by Fábio, Paula, and Ivo from Portugal Telecom. I want to thank you three for joining us. Why don’t we start off by telling us a little about what Portugal Telecom does as well as what your roles are there.

PF: Hi, I am Paula, and I’m a manager at PT. Portugal Telecom is a global telecommunications operator. The company’s activity covers every segment of the telecommunications sector: fixed, mobile, multimedia, data and corporate solutions. We were originally founded in Portugal but we are now a global enterprise and we are expanding to other countries. As an example, in the Brazilian market, PT is present in Oi, the largest telecommunications operator in South America.



And how do you use Cassandra?

IJ: We decided to use Cassandra because our team specifically monitors all user touch points with Portugal Telecom. What I mean by this is that we monitor our call centers, interactive receiver channels, web applications, and much more. We monitor every aspect of the combinations of the customer care experience. Because of this, we receive a lot of data: every time someone calls to our call center, logs into our website, or goes into a physical store, we receive data from these events. We are using Cassandra to store all of these events and also to store metrics we gather from these events.


This data is afterwards consumed by other applications that build infographics such as: graph bars, tables, and those kinds of things. These infographics are then used by internal teams at Portugal Telecom to improve our relationship with the clients, improve the response times, and improve the overall quality of service.


PF:  We are focused on providing a positive customer experience, along all channels of customer care. We aggregate the information received from these different channels, so we have an extensive knowledge about how we interact with our customers and use this knowledge to improve the experience.


It sounds like you’re collecting a very large amount of data, at many different data points. How big is your cluster and/or how much data are you storing?

FC: For this current project, we only have one data center and we’re using 2 Cassandra nodes. We’re using virtual machines, so we assume the disks are virtual as well. The amount of data we are storing right now is only around 5GB but we have over 1.5 billion keys and we receive more than 1 million events per day, but we expect this number to grow a lot in the next few months and, as it grows, our cluster is expected to grow with more nodes joining the ring.

IJ:  This is just for starters because our team has a lot of projects and not all of the projects are Cassandra yet. When all of our projects start using Cassandra, we believe this number will explode; we have to collect a lot of data and events, and we have to store those events for somewhat of a long period of time, in order to correlate these events with one another.


Like Paula said, we’re just starting to fully utilize Cassandra. We believe that in the near future, we’ll have much more information. As Fábio stated, we are using 5GB and 1.5 billion keys; these numbers will quadruple, easily in the near future.


PF:  This is for just these markets. If you want to try these in another market with a larger number of clients, these will really explode.


IJ: Portugal Telecom is now expanding to Brazil, which is a market that is roughly 25 times larger than ours, so we believe with the help of Cassandra we’ll store huge loads of information.


That’s great. Out of curiosity, what led you to use Cassandra in the first place?

IJ:  We originally were using MySQL databases. When we started the application that monitors call centers only,  we started collecting data and we realized that no regular SQL database would be enough because of the number of events arriving was too large. We got to a point where we were only storing parts of the information that we needed to store and already some of the tables were going up by 5 million entries a week, so it wasn’t enough. We needed something that scaled up and also allowed us to write more information and control that information.


Mainly Fábio was one of the persons studying all the database possibilities we could chose from and, in the end, he chose Cassandra.


FC:  Well, our main motivation was basically asking: “Which database is going to scale with the stability that we need?”. We had lots of writes and fewer reads, and with Cassandra having a great write rate, we found it was a great technology for our performance issue. Also, it happens that the data we’re gathering is, sometimes, unstructured and the dynamic structure feature of Cassandra is helpful in this way. In the end, Cassandra has great documentation and a large community with open doors and many developers answering questions online.  We tried other NoSQL technologies such as MongoDB but at the time we thought Cassandra was better for our problem.


IJ:  I think our decision was also based on the companies today that are using Cassandra. What got us into using Cassandra initially was the fact that, for instance, eBay uses Cassandra. So we thought “if eBay uses Cassandra, it’s probably a good thing.” If nobody reputable was using Cassandra, we probably would be a little bit suspicious; you know when a large company uses something and it works, it probably works in other scenarios.


That makes a lot of sense and, to your point, Cassandra is used at many very large companies who are trusting their business on it. As you mentioned, obviously eBay uses it but also companies like Netflix and even Comcast uses it here in the United States as well. It is proven time and time again to be very reliable. In closing, would you like to share about your Cassandra community experiences?

FC: Our community experience with Cassandra has been in the IRC channel and also in the Apache Cassandra mailing lists: the developer mailing list and the user’s mailing list.

IJ: We are new to the community, and we haven’t used Cassandra for that long yet. None of us have worked with Cassandra before and we don’t have anyone who is a specialist on Cassandra, so we are all learning as we develop. Anytime we can, we will go to stackoverflow and other online Q&A forums and answer questions if we know the answer.