February 28th, 2013

You may have read the blog of my good friend Nathan. He blogs about a lot of cool stuff and he does a much better job adding memo images like this one. Nathan uses chef as his configuration management and deployment tool while I use puppet. About once a week I seriously consider switching to chef because his company, outbrain and my company m6d seem to use almost the exact same software stack, tomcat, cassandra, mysql, hadoop, hive, etc, etc. He and I can probably do half the work we do now just by sharing templates/recipes/modules for installing stuff.

As it turns out we are both now deploying Apache Kafka. For fun, I have challenged Nathan to a sysadmin grudge match. The winner will be decided by who gets higher on hacker news, who gets more blog comments like “awesome”, or who can get the coolest repost like highscalability.org, cnet, or whatever. The winner will be proclaimed King Bro-min of nyc ad tech startups.

The battle will be a series of blogs in a back and forth format. I am going to strike first with m6d’s Kafka RPM and spec file. As you know turning an apache project into an RPM is the #1 way to get a multi-million dollar funded big data start up, yet here I give you my secret sauce for free.

The entire package is found at https://github.com/edwardcapriolo/kafka-rpm

You may find some things about this package seem different the other packages, but these things were done for a specific reason.

First, this package installs to /opt/kafka not /usr/lib/kafka, /etc/kafka, bla bla. JPackage and some of the other packagers I understand why they layout java things according to the LSB standards, but for me this never made sense.  To many symlinks and too much hacking to make things work. Most java projects are designed to run single directory so we “kept it real”.

The packages provides LSB standard init scripts for starting Kafka as well as starting Zookeeper. Daemon-izing java stuff is always a pain in the arse, but start and stop works. (This is more then I can say for most init scripts)

You may have noticed something about the spec file (the main file that drives the RPM building). This spec does not create any system users, it also does not provide any default properties files for configuration. This was done specifically. In our deployment puppet creates our users, and when RPM does it if you do not take care the user kafka may become different ids on different systems (which is a pet peeve of mine)

Also you will see that we intentionally left the configuration parts out of the RPM. This is because our puppet module uses templates to build the configurations dynamically based on parameters. The puppet module will be the focus of the next blog post, and trust me when you see it in action your mind will be blown.

For round one in the bro-min battle I come in with a conservative approach, looking to feel out the competition.