Top Posts This Month
Upcoming Webinars
Global Posts
Juarez Bochi Senior Software Engineer at Globo
"The main problem with this [Redis] solution is that it does not have high availability. If the Redis server failed, we would switch to another Redis instance, but we would lose all the video recorded. Thankfully, this did not happen during any World Cup match (although we would not mind deleting those 7 goals from Germany)."
Juarez Bochi Senior Software Engineer at Globo

Globo.com is the Internet arm of Grupo Globo, the largest media conglomerate in Latin America. We have recently broadcasted the FIFA 2014 World Cup for more than 450 thousand concurrent users.

Our new player, called Clappr, supports HLS, a protocol that is also supported by the two major mobile platforms: Android and iOS. This is a huge advantage, because we can stream to web and mobile clients with a single protocol, keeping our architecture very simple and efficient. Basically, we use an encoder (mostly Elemental) to encode a video signal in H.264 and ingest the video stream into our segmenter (EvoStream) using the RTMP Protocol. Evostream creates the HLS video chunks and video playlists in disk and because HLS is HTTP based, we can use a couple of geographically distributed layers of Nginx servers to deliver and cache the video files.

Redis rewind

One of the new features of our Live Video Streaming platform, developed for the World Cup, is the DVR, or Digital Video Recording. We keep the last couple of hours of the video stream recorded, so the user can pause or seek back to watch that goal he missed. For the World Cup, we only had two simultaneous matches, so we could easily keep the streams in memory. To add this feature, we created a Python application that moves the HLS video segments into Redis and then scripted Nginx with Lua to get the segments from Redis and to generate HLS playlists dynamically.

The main problem with this solution is that it does not have high availability. If the Redis server failed (which is rare, but happens), we would switch to another Redis instance, but we would lose all the video recorded. Thankfully, this did not happen during any World Cup match (although we would not mind deleting those 7 goals from Germany). The other issue is that this solution did not scale well for more streams. Since each stream requires around 10GB per recorded hour, we could not record much more than two or three streams with a single Redis instance.

Brazil recently had presidential and governor elections and we wanted to stream all 27 of the governor debates that would occur simultaneously, and with the DVR functionality. Since our solution with Redis would not scale, we decided to give Cassandra a try.

It turned out to be relatively easy to modify our Python software to post the files to Cassandra instead of Redis using the DataStax Python driver. The hardest problem was to handle the huge ammount of file events that this application had to handle. We ended up creating multiple Python processes and using async execution. Another issue is that we did not know how we would extract the blobs from Cassandra and deliver them using Nginx, since we couldn’t find Cassandra driver’s available for Lua or Nginx. We thought about developing a Python application, but we are huge fans of Lua and Nginx, so we decided to go ahead and develop our own driver for Lua: lua-resty-cassandra; yes, it’s open source! When it was ready, it became pretty easy to port our Lua scripts from Redis to Cassandra.

Cassandra anti-patterns & compaction

Our solution worked, but Cassandra response time increased a lot after a few hours. After a certain point, the clients would start to timeout, and the video playback stopped. It took a couple of weeks for us to realize that we had implemented a known Cassandra anti-pattern: Queues and queue-like datasets.

Fortunately, we could fix the problem with a few simple changes:

We denormalized our data, using different tables for chunks (since each video chunk can have up to 2MB) and chunk indexes (that are stored as rows and can contain a few thousand columns).

We changed the compaction strategy to LeveledCompactionStrategy, which works better when you have to do frequent compactions.
We set durable_writes to False, since most of our data is ephemeral.

Finally, but most importantly, since we knew the maximum size a playlist could have, we could specify the start column (filtering with id > minTimeuuid(now – playlist_duration)), as suggested on DataStax’s blog. This really mitigated the effect of tombstones for reads.

After these changes, we were able to achieve a latency in the order of 10ms for our 99% percentile. The debates were held and we were able to stream all of the 27 streams with 2 hours of recording with no major problems. We are really happy with Cassandra now.

Cassandra Summit 2014: Fuzzy Matching at Scale

October 24, 2014

By 

In the last few months I’ve given two different talks about scalable fuzzy matching.

The first was a Strata in San Jose, titled Similarity at Scale.  In that talk I focused mostly on techniques for doing fuzzy matching (or joins) between large data sets, primarily via Cascading workflows.

More recently I presented at Cassandra Summit 2014, on Fuzzy Entity Matching. This was a different take on the same issue, where the focus was ad hoc queries to match one target against a large corpus. The approach I covered in depth was to use Solr queries to create a reduced set of candidates, after which you could apply typical “match distance” heuristics to re-score/re-rank the results.

The video for this second talk is freely available (thanks, DataStax!) and you can watch me lead off with an “uhm” right here:

Fuzzy Matching at Scale” was created by Ken Krugler, President of Scale Unlimited.

Notes from Cassandra Summit: The Rise of Top Line IT

October 23, 2014

By 

You can always tell a great tech event from two things: the Q&A in the sessions, and the side conversations about real-world uses. (A lousy tech event has fancy giveaways and gimmicks and same-as-last-year demos, but that’s another story.) By all the standards I know, Cassandra Summit was a great event.

A lot has changed since open source was just a way to make your company IT environment work better, with high-quality replacements for existing product categories. Today, open source projects such as Cassandra are changing the categories from the user side, something like what Doc Searls was on to back in 2004, when he talked about DIY IT. Instead of just making a more flexible, more cost-effective entry in an existing product category, today’s open source is carving out whole new ones. And the developers are coming from all kinds of companies,
not just cool startups.

And the big win from the new generation of open source? It’s transforming the IT function from just a cost center, the department that doesn’t matter, into a way to build new things. But moving from the old cost center IT to the new IT that moves the top line means we all have to be ruthless in eliminating the old busywork. The software is out there to let the machines do their own routine babysitting, so we had better use it. We were all impressed with how Cassandra handled a recent emergency cloud reboot, and that kind of resiliency is key to how we can make the new generation come together.

Q&A on OSv

We got a lot of great questions at our session on OSv and afterward. OSv is a new OS for VMs in the cloud, with none of the complexities of an old bare-metal OS. No local config files, no permissions, just the basics that you need to run a single Java or POSIX application at maximum speed. Building a complete VM? That’s easy to do on any platform, and only takes only nine seconds out of your day. And, yes, we can bring every JMX management object out to its own REST endpoint with Jolokia. This means that managing the whole virtual appliance–application, JVM, OS–is all in one place, no need to mess around with arcane OS-level tools.

All in all, an OSv virtual machine gives you more Cassandra throughput, along with the all-important benefit of freeing up your time as an IT professional to focus on creating new value, not just keeping the old stuff up. Cassandra is full of great computer science that makes it resilient and scalable. OSv is full of great computer science that gives it high throughput and low overhead on any cloud, public or private. Together, they take a lot of the old, time-consuming work off of your to-do list and free you up to build the new, valuable stuff. It’s never been a better time to be in IT, and I’m not just saying it because of the groovy Cassandra Summit backpack. See you next year.

Down with Tweaking! Removing Tunable Complexity for Cassandra Performance and Administrator

Company: Cloudius Systems

Abstract: The need for performance tuning of the JVM and OS is making administrators the bottleneck for Cassandra deployments–especially in virtual environments. Over the past two years, the OSv project has profiled tuning-sensitive applications with a special focus on Cassandra. Today, many of the important bottlenecks for NoSQL applications are tunable on a conventional OS, but do not require tuning in the OSv environment. OSv gives Cassandra a simpler environment, set up to run one application in a single address space. This talk will cover how to use OSv to improve performance in key areas such as JVM memory allocation and network throughput–without loading up your to-do list with difficult tuning tasks.

Greg Cooper Senior Software Engineer at Iovation
"Iovation picked Cassandra to be able to scale their services out and grow them linearly both in terms of scale and predictable cost growth... With Oracle we had stability issues once the data sets got really huge and that’s been great with Cassandra."
Greg Cooper Senior Software Engineer at Iovation

Iovation’s a service that other customers will incorporate into their business to help detect fraud and it works in real time trying to find fraudulent accounts. You probably use Iovation services indirectly through other customers’ websites. As the device reputation authority, we’re able to expose and predict trustworthiness from a consumer’s interactions across the broad online landscape, including retail, financial services, telecommunications, social networking, logistics, dating, gaming, and gambling.

After migrating to Apache Cassandra from Oracle in 2013, Iovation has increased traffic volume by 600% with faster processing times – all at an 8X cost savings.

Oracle pain, Cassandra gain

I’ve been at iovation about three years and just before I arrived they integrated Cassandra 0.6 into the mix to do some of the real time traffic for a lot of the device recognition applications.  Since, we’ve used it in a couple of our other services we have several clusters of Cassandra 1.1.6. Right now we’re using it primarily for our real time services and fast responses and that sort of thing and it’s been great.

Iovation was primarily an Oracle relational database shop before they got into Cassandra, prior to me joining.  Iovation picked Cassandra to be able to scale their services out and grow them linearly both in terms of scale and predictable cost growth.  In addition to that, one of the motivators for replacing Oracle was the ease of maintenance and growth. With Oracle we had stability issues once the data sets got really huge and that’s been great with Cassandra.

Multi-data center & vnodes

We have three of our own data centers so it’s really a private cloud kind of scenario so three data centers with data replicated in each place and we have three clusters in production, one of them is a 24-node cluster with eight in each data center for our reputation service and then we have a 12-node cluster with four nodes in each data center for one of our services and we have another one, a little smaller one for velocity based things.  It’s two in each data center. I think they’re all on Cassandra 1.1.6 and we’re pretty excited to dig into  1.2.  

In 1.2 one of the big things is the ability to grow the cluster without having to double it; right now we have this paradigm where if we want to grow a cluster the easiest thing is to double it just to get the tokens to keep from having to move around our data.  With virtual nodes, added in 1.2, and the token assignments we’re excited to be able to grow the cluster by single and piecemeal nodes and by a couple machines rather than double it.  That’s one very exciting thing.

Trust in Cassandra

It’s really nice to see the stability just gets better and better with each release and every time a new release comes out I’m really impressed with the stability and simplicity that’s introduced from the previous versions, it’s been great.

 

Read more in TechRepublic’s 

Big data success: iovation drops Oracle for Apache Cassandra, sees 600% volume growth

Escaping From Disco-Era Data Modeling

October 21, 2014

By 

On StackOverflow, I have seen Cassandra used in a lot of strange ways – particularly when it comes to secondary indexes.  I believe much of the confusion that exists is due to the majority of new Cassandra users having their understanding of database indexing grounded in experience with relational databases.  They tend to approach a Cassandra data model with the goal of trying find the most efficient way to store the data, and then add secondary indexes in a vain attempt to satisfy their potential query patterns.

 

This approach usually manifests itself in questions like this:

Here is my schema:

Why doesn’t my SELECT query work?

Or (for the sake of argument) replace the “ORDER BY” clause with “ALLOW FILTERING” and ask:

Why is my SELECT query so slow?

This model is representative of what you would commonly see in a relational database.  It does a great job of storing the data in a logical way, keyed with a unique identifier (messageid), and clustered by a date/time column.  Great that is, until you want to query it by anything other than each row’s unique identifier.  So the next logical leap is made, and the next thing you know, three or four (or more) secondary indexes are created on this column family (table).

Secondary indexes in Cassandra are not meant to be used as a “magic bullet,” allowing a column family to be queried by multiple high-cardinality keys.  Indexes on high-cardinality columns (columns with a large number of potential values) perform poorly, as many seeks will be required to filter a large number of rows down to a small number of results.  Likewise, indexes on extremely low-cardinality columns (booleans or columns with a small number of values, such as gender) perform badly, due to the large number of rows that will be returned.  Also, secondary indexes do not perform well in large clusters, as result sets are assembled from multiple nodes (the more nodes you add, the more network time you introduce into the equation).

There are other important points concerning secondary index use, such as the conditions under which they are intended to be used.  These (and those mentioned above) are detailed in a DataStax document titled When To Use An Index.  When helping others with Cassandra data modeling, this is the document that I will most often refer people to.  Anyone considering using secondary indexes should read that document thoroughly.

Cassandra queries work best when given a specific, precise row key.  The use of additional, differently-keyed column families populated with the same (redundant) data will perform faster than a query which uses a secondary index.  Therefore, the proper way to solve this problem is to create a column family to match each query.  This is known as query-based modeling.  Here is one way to design a column family to suit the afore-mentioned query:

With this solution, the data will be partitioned on fromjid and tojid, and clustered by messagedatetime.  Messageid is added to the PRIMARY KEY definition to ensure uniqueness.  This will allow the user’s query to be keyed on fromjid and tojid, while still allowing an ORDER BY on the clustering column messagedatetime.

 

Of course, it’s hard to change the way people think within the bounds of a single conversation.  Here is a typical response to the above solution:

I know my original design somewhat violates the whole NoSQL concept, but it saves a lot of memory.

That is a very 1970′s way of thinking.  Relational database theory originated at a time when disk space was expensive.  In 1975, some vendors were selling disk space at a staggering eleven thousand dollars per megabyte (depending on the vendor and model).  Even in 1980, if you wanted to buy a gigabyte’s worth of storage space, you could still expect to spend around a million dollars.  Today (2014), you can buy a terabyte drive for sixty bucks.  Disk space is cheap; operation time is the expensive part.  And overuse of secondary indexes will increase your operation time.

Therefore, in Cassandra, you should take a query-based modeling approach.  Essentially, (Patel, 2014) model your column families according to how it makes sense to query your data.  This is a departure from relational data modeling, where tables are built according to how it makes sense to store the data.  Often, query-based modeling results in storage of redundant data (and sometimes data that is not dependent on its primary row key)…and that’s ok.

In summary, we have managed to free ourselves from bell-bottoms, disco, and shag carpeting.  Now it is also time that we free ourselves from the notion that our data models need be normalized to the max; as well as the assumption that all data storage technologies are engineered to work with normalized structures.  With Cassandra, it is essential to model your data to suit your query patterns.  If you do that properly, then you shouldn’t even need a secondary index…let alone three or four of them.

____

References

DataStax (2014).  When To Use An Index.  Retrieved from: http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_when_use_index_c.html on 09/29/2014.

McCallum, J. (2014).  Disk Drive Prices (1955-2014).  Retrieved from: http://www.jcmit.com/diskprice.htm on 09/29/2014.

Patel, J. (2012).  Cassandra Data Modeling Best Practices, Part 1.  Retrieved from: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1 on 10/03/2014.

 

1 2 3 138