Fork me on GitHub

tutorials.github.com

Decentralized polyglot tutorials

Creating a Production Storm Cluster

Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!

This tutorial will help you set up a production storm cluster from scratch.

Assumptions

We assume you have two machines that you can ssh into as the deploy user, and that user has sudo privleges. We’ll call these machines storm and zookeeper. It’s ok if you only have one machine, this tutorial will handle that case as well.

Java

We need to install the JDK (which includes the JRE). Oracle requires you accept the license agreement, so I prefer to download this locally and then scp the file to my host. To download the JDK, go to the jdk download page, accept the license agreement, and download the file called jdk-7u5-linux-x64.rpm. Then copy it to the deploy user’s home directory using scp:

scp -C jdk-7u5-linux-x64.rpm deploy@zookeeper:/home/deploy
scp -C jdk-7u5-linux-x64.rpm deploy@storm:/home/deploy

Then we install it on each machine (and setup our JAVA_HOME and PATH to find all the java binaries):

ssh zookeeper
sudo rpm -Uvh jdk-7u5-linux-x64.rpm
echo "export JAVA_HOME=/usr/java/default \
export PATH=$PATH:$JAVA_HOME/bin:$HOME/bin" > ~/.bash_profile
logout

ssh storm
sudo rpm -Uvh jdk-7u5-linux-x64.rpm
echo "export JAVA_HOME=/usr/java/default \
export PATH=$PATH:$JAVA_HOME/bin:$HOME/bin" > ~/.bash_profile
logout

Zookeeper

Installing Dependencies

First we need to install some dependencies and setup a place to keep our source files:

ssh deploy@zookeeper
mkdir -p ~/src
sudo yum install -y libtool libuuid-devel gcc-c++ make

Installing Zookeeper

Now we’re ready to install zookeeper:

cd ~/src
wget http://mirrors.axint.net/apache/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.tar.gz
tar xzf zookeeper-3.4.3.tar.gz

Running Zookeeper

Zookeeper is configured by a file located in conf/zookeper.conf. We’ll use this as our zookeeper config when we start the zookeeper server:

~/src/zookeeper-3.4.3/bin/zkServer.sh start ~/src/zookeeper-3.4.3/conf/zoo_sample.cfg

Storm

Now that zookeeper is running, we can setup our storm servers. First we need to install native dependencies:

Installing Dependencies

ssh deploy@zookeeper
sudo yum install -y git libtool libuuid-devel gcc-c++ make
mkdir -p ~/src /tmp/storm

Installing ZeroMQ

cd ~/src
wget http://download.zeromq.org/zeromq-2.1.7.tar.gz
tar xzf zeromq-2.1.7.tar.gz
cd zeromq-2.1.7
./configure
make
sudo make install

Installing JZMQ

cd ~/src
git clone https://github.com/nathanmarz/jzmq.git
cd jzmq
./autogen.sh
./configure
make
sudo make install

Installing Storm

cd ~/src
wget https://github.com/downloads/nathanmarz/storm/storm-0.7.0.zip
unzip storm-0.7.0.zip

Configuring Storm

We need to point storm to the correct zookeeper servers, as well as setup other config options. For now, we’ll just modify the conf/storm.yaml file to look like this:

storm.local.dir: "/tmp/storm"
storm.zookeeper.servers:
  - "zookeeper"
nimbus.host: "localhost"

You’ll want to put some real values in there depending on the size of your cluster, and you’ll definitely want to change /tmp/storm to something more persistent, but this sufices for a demo.

Running Storm

In order to run storm, you need both nimbus and supervisor processes running. We can run them with nohup:

nohup ~/src/storm-0.7.0/bin/storm nimbus &
nohup ~/src/storm-0.7.0/bin/storm supervisor &

Now it should be ready to process topologies!

Running the Storm UI

Storm comes with a nifty dashboard to view cluster stats. To run it, just do:

nohup ~/src/storm-0.7.0/bin/storm ui &