May 8th, 2013

Read Yuan’s Next Great Data Developer Blog.


Some words about Why Cassandra
My web development experience traces back to 10 years and I have been using MySql as a habit. The first time I read about Cassandra was last year when I was developing a website for a bio-hitech company called Antigene. Similar to social network, they have extremely large data of antibody stored in database. Not much complex deep relational search is needed but only light selection search. When data is growing really big, the query of traditional sql relational database is becoming really slow.

It was then I started an eye on NoSql. Thanks to the wonderful project opportunity offered by DataStax, I finally made up my mind to switch to Cassandra. At first, it gave me a hard time, in data modeling and database structure design. Think it as a dictionary not as a table, which gives me a better vision understanding of advantage of Cassandra.

Then I began to taste the sweetness of Cassandra and still now on the way to find more fun.

Some words about my summary of progress
I haven’t yet upload much thing to Github yet. Instead, I am working on local machine and Amazon EC2. Starting from my most familiar grand, I first developed a web application for SmartGift(Almost Done, of coz using Cassandra, thanks to phpcassa) and then continue on naive app(which is one sitting on my Samsung Note Pad, still need much polishing)

Re-thinking of PHP Cassandra Connection 
Previously I used phpcassa to let PHP directly connect with Cassandra. Now a question of stateless PHP caught my attention:(quoting below:)

How to solve the pain of “stateless” PHP with Cassandra?

PHP from itself is stateless, as in: a request comes in, the webserver starts a thread and starts parsing the PHP, returns it to the browser: DONE. Of course there are “states” in PHP with sessions, but those are to keep things in line for the client; not the server.


When you use a distributed backend like Cassandra you feel confident with failure of a couple of nodes. If your replication level is high enough, there will always last a replica of your data somewhere.


The challenge is to mix the both of this. When you use a Java application to render webpages it will be able to keep track of the states of the backend Cassandra nodes. With PHP that would be pretty hard and most of the options on the PHP-side will be slow, or just completely stupid if you think about it.


From my point of view the best is to use a TCP proxy to forward the requests from PHP to the available backends. 
So I am testing HAProxy now.