Blog



Cassandra 2.0 Support for DataStax C# Driver

April 16, 2014

By 

We’re glad to release today the first beta of our C# Driver 2.0 that supports Apache Cassandra 2.0 and DataStax Enterprise 4.0, while remaining fully compatible with Cassandra 1.2 and DSE versions relying on it. This driver is intended to be aligned with the feature set that comes in our Java Driver 2.0. It

In practice this means that C# developers can now enjoy:

We have several other improvements and changes to come in the next coming weeks as we’ll iterate through several beta versions:

  • Task based API
  • Using some Interfaces instead of Classes in the API to make it easier to mock every parts of the driver
  • Automatic paging is a feature that has been introduced in Cassandra 2.0. It’s not part of this first beta but will be included in the next one.

This new C# Driver 2.0.0-beta1 is now available on NuGet. Feel free to give it a try!

 

Post a Comment with Disqus

UPDATE: Cassandra Migration Yields Insane (10x) Performance Improvements at Rekko

April 16, 2014

By 

Cassandra Migration Yields Insane Performance Improvements” was created by Robert Thanh Parker, CEO and Founder at Rekko.

Team,

Wanted to let you guys know that I posted the message on our FB page. It’s amazing stuff and something the team has worked incredibly hard on both refactoring our code and architecting and optimizing our infrastructure over Cassandra. More importantly, this update solves two remaining issues:

(1) From time to time, we’d see spikes in load time, while unusual,  resulted from a key update lock in Mongo. The new infrastructure removes this 100% and performance should be and is completely smooth since the update. Average server response times are less than 5ms for our most complex campaign delivery requests. Previously they were at closer to 50ms with spikes that were much higher.

(2) Data manageability. We collect inordinate amounts of data (much more than most Analytics providers), complicating the management of visitors while we scale. This makes performance an ongoing challenge. The new infrastructure largely solves this, but more importantly, at a cost structure that will continue to allow us to deliver increasingly more sophisticated technology at lower costs. A key tenant of our long-term vision of bringing our technology to every online piece of real-estate in the world. Big step there.

newrelicapril15

Here’s the quick post:

New Update: New Rekko Big Data Engine updated. Core services go live and we’re 10x faster overnight!

Speed. Speed. Speed.
We were fast before, but now we’ve concluded a major infrastructure refactoring. Our vision is to make accessible and automate big data personalization for small and medium sized businesses. This is a huge and crucial update towards our goal.

Lowering the Cost/Customer
Driving down the cost of providing enterprise level technology and services such that SMBs can EASILY leverage them is the most essential step in taking this technology mainstream. The first HD plasma TV I saw cost $29,999. The one I just bought cost significantly less. Our first Rekko customers paid $42k/month, our new ones pay a world less.

The Migration.
After a period of running Cassandra DB simultaneously with Mongo DB, the team completed the majority of our migration last night – we’re now completely live on Cassandra. While there are some small portions of infrastructure that will continue to use Mongo, almost everything material is now migrated.

The Results.
To summarize, the slowest of response times on Cassandra (for real-time profiling and campaign delivery) average more than 10x better than fastest we had utilizing the Mongo DB code and infrastructure. We’re now able to intelligently deliver dynamic, tailored content to the a visitor’s browser in less than 1/200th of a second. Consistently, and without spikes.

This is our last infrastructure step prior to rolling out…

#GoBigorGoHome

Some days, we just smile because we know the hard work pushed us three steps forward. Today is one of those days…

Best,
Parker
Founder, Rekko

If you’re interested in learning more about migrating from MongoDB to Apache Cassandra, visit the MongoDB to Cassandra Migration page for resources and how-to’s.

Post a Comment with Disqus

Gilt Hackathon Dives into Apache Cassandra with DigBigData and DataStax

April 14, 2014

By 

 

GiltTech
 

Lauri Apple

 

Lauri Apple Technology Evangelism Specialist at Gilt

 

In late March Gilt’s Dublin team partnered up with Dublin-based consultancy DigBigData to offer a free Cassandra workshop near our Dublin office. Twenty-five technologists from Gilt and other area companies came together for a full day of hands-on learning, experimentation and fun taught by DigBigData’s Niall Milton (an official MVP for Apache Cassandra). Gilt’s Cassandra workshop was part of our free tech education initiative, by which we offer full-day tech courses at no charge to both Gilt and non-Gilt technologists. Since launching this program in June 2013, we’ve offered classes in Scala, Hadoop, R, Machine Learning and other topics of interest to us–and more courses are on the way. Nearly half of our Dublin team signed up for the Cassandra workshop, while other attendees came from Workday, Dun & Bradstreet and other companies.

Gilt currently doesn’t use Cassandra in production, but as NoSQL enthusiasts and open source advocates we’re quite interested in learning more about how it works. Several workshop attendees had prior experience working with older versions of Cassandra and wanted a quick refresher. Others on the team had very minimal experience, or had read the Dynamo and BigTable papers but never tried using it. Because everyone in the class was an experienced technologist, however, getting started posed very few problems.

The biggest challenge for me was switching to working with a column-based database, having always worked with traditional row-based databases,” says Gilt Lead Software Engineer John Kenny. Adds Emerson Loureiro, another Gilt engineer: “I had no prior experience with Cassandra itself, but was familiar with most of the concepts behind it, so getting started was quite OK. To me it was more about looking at data modeling from a different perspective.”

After giving an introduction to Cassandra, Milton split the course into six teams who then set to work on building a variety of applications. Over the course of the day, teams asked lots of questions about performance, replication, fault tolerance, and other nuts-and-bolts aspects of Cassandra. By workshop’s end, the teams had created several exciting projects, including a CPU temperature monitor, a tweet sentiment analyzer, a multi-player, web-based game, and BigChat—a SnapChat-inspired service.

Though some of the students said they’d have benefited from more time to develop their projects, others were pleased with the end results of their work. “I think it was a nice use case for Cassandra,” says Emerson about the course. “It gave me the opportunity to stress the bits we had learned in the course and to get some more hands-on experience.”

Post a Comment with Disqus

Jonathan Ellis, Apache Cassandra Chair, discusses the history of Apache Cassandra

April 14, 2014

By 

Post a Comment with Disqus

Deploy a Twitter clone with a single command

April 11, 2014

By 

Deploying a Twitter clone with a single command was created by James Horey.

In this post I’ll show you how to deploy Twissandra, an open-source Twitter clone using Ferry. Twissandra is a simple Django application that uses Cassandra to store tweets and user information. Normally Twissandra assumes that you already have Cassandra installed on your local machine. While installing and running Cassandra locally isn’t too difficult, it’s a lot simpler and faster to use Ferry. Plus Ferry is designed to manage multiple application stacks and supports several backend technologies including Hadoop and Open MPI. That means that when you’re ready to experiment with a Hadoop application, you’ll be able to start right away without having to learn how to configure yet another backend.

If you’re interested in watching a short video version of this post, head over here. You’ll still want to read the rest of the post since it contains important information on what’s going on. Because our demo is going to use Ferry, I’m going to assume that you have Ferry installed and working. If not, please go ahead and install it using the instructions here. While this tutorial may be useful even if you don’t have Ferry installed, it’s a lot more fun if you do.

Now to get started we’ll need to download the Twissandra application. Type these commands into your prompt:

If you were to do a ls cassandra-examples/twissandra, you would see two important files that we need. The first is the Dockerfile. This file tells Ferry how to build the Twissandra application. For more information on Dockerfiles head over here. The second file is called twissandra.yaml. This file defines the application stack required to run the application. Before examining what’s in these files, let’s start Twissandra. Type the following into your prompt:

The start command builds the Twissandra Dockerfile and provisions the backend Cassandra cluster. This process usually takes a few seconds. You’ll know it’s done when it prints out the unique ID of your application. Afterwards, you can find the IP address of your Twissandra application by typing:

This command will print out detailed information regarding your application stack. For now, we just need the internal_ip value of our connector. It should look something like this:

Paste that IP into your web browser, and you should be greeted with a Twissandra web interface.

Twissandra

Congratulations, you’ve now deployed a working Cassandra cluster connected to our Twitter clone!

As of right now, you can only access the website on the machine you’ve used to start the application. I plan on supporting port redirection in the near future so follow our progress on Github and Twitter.

Now that we have Twissandra up and running, let’s take a step back and examine the two files that defines our application. First if you take a look at twissandra.yaml, you’ll see something like this:

This configuration file tells Ferry to create a single-node Cassandra cluster and a single Twissandra client that connects to that cluster.

Before proceeding further, there is one thing I’d like to mention. Ferry runs everything in distributed mode. That means we could increase the instances count on our Cassandra cluster, and Ferry would happily oblige. Ferry uses Docker underneath, so the overhead of running multiple nodes is very small. Why would we want to do this? Cassandra enables developers to choose from various “partitioning” strategies (i.e., how data gets split over multiple machines). This, in turn, has a huge effect on the overall performance of your application. As part of the data modeling process, we can use Ferry to stand up multiple clusters, each with a different partitioning strategy. While we won’t do that today, this is something that I plan on writing about in the future.

Now back to our YAML file. Since we’re using a custom client, we’ll need to provide a Dockerfile that tells Ferry how to build that client. If you open the Dockerfile, you’ll see something like this:

The first line is pretty important:

It tells Ferry that this client is based off of the official Cassandra client. The Cassandra client is included with Ferry and consists of a basic Ubuntu 12.04 image with a few Cassandra-specific packages. The other lines in the Dockerfile install additional packages that our client uses, and downloads the Twissandra source code.

The last line in the Dockerfile is also important:

That line basically tells Ferry that the script twissandra.sh should be added to the image and placed in the /service/runscripts/start directory. This directory is special because it contains all the scripts that should be executed when the connector starts. Ferry supports additional “events”, including stoprestart, and test. Each of these events are defined by their respective runscript directory (/service/runscripts/stop, etc.). By placing an executable script in the relevant directory, the script will be executed when that event occurs.

If you take a look at twissandra.sh, you’ll see the following command near the bottom:

That means that when the connector starts, the first thing it will do is start the Twissandra server.

And that’s it! As you can see, the Dockerfile is fairly short and straightforward. The first time Ferry starts this application, it will build the Dockerfile and save it as an image (examples/twissandra). If you start the application again, it should go a bit faster since the image will be cached. Of course you can stop the application and start it again later. If you do, all your tweets should be saved. Once you’re done experimenting with the application, you can remove the application and everything will go away.

Because our entire application is defined by the application YAML file and the client Dockerfile, it’s super simple to make your own changes and to share those changes with others. For now, other people will still need to be able to build your Dockerfile (by using the -b flag during start), but I plan on supporting image uploads in the future via Docker registries. If this feature is important to you, let me know.

Post a Comment with Disqus
1 2 3 113