Creating a Production Storm Cluster
Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!
This tutorial will help you set up a production storm cluster from scratch.
We assume you have two machines that you can ssh into as the
deploy user, and that user has
sudo privleges. We’ll call these machines
zookeeper. It’s ok if you only have one machine, this tutorial will handle that case as well.
We need to install the JDK (which includes the JRE). Oracle requires you accept the license agreement, so I prefer to download this locally and then
scp the file to my host. To download the JDK, go to the jdk download page, accept the license agreement, and download the file called
jdk-7u5-linux-x64.rpm. Then copy it to the
deploy user’s home directory using
scp -C jdk-7u5-linux-x64.rpm deploy@zookeeper:/home/deploy scp -C jdk-7u5-linux-x64.rpm deploy@storm:/home/deploy
Then we install it on each machine (and setup our
PATH to find all the java binaries):
ssh zookeeper sudo rpm -Uvh jdk-7u5-linux-x64.rpm echo "export JAVA_HOME=/usr/java/default \ export PATH=$PATH:$JAVA_HOME/bin:$HOME/bin" > ~/.bash_profile logout ssh storm sudo rpm -Uvh jdk-7u5-linux-x64.rpm echo "export JAVA_HOME=/usr/java/default \ export PATH=$PATH:$JAVA_HOME/bin:$HOME/bin" > ~/.bash_profile logout
First we need to install some dependencies and setup a place to keep our source files:
ssh deploy@zookeeper mkdir -p ~/src sudo yum install -y libtool libuuid-devel gcc-c++ make
Now we’re ready to install zookeeper:
cd ~/src wget http://mirrors.axint.net/apache/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.tar.gz tar xzf zookeeper-3.4.3.tar.gz
Zookeeper is configured by a file located in
conf/zookeper.conf. We’ll use this as our zookeeper config when we start the zookeeper server:
~/src/zookeeper-3.4.3/bin/zkServer.sh start ~/src/zookeeper-3.4.3/conf/zoo_sample.cfg
Now that zookeeper is running, we can setup our storm servers. First we need to install native dependencies:
ssh deploy@zookeeper sudo yum install -y git libtool libuuid-devel gcc-c++ make mkdir -p ~/src /tmp/storm
cd ~/src wget http://download.zeromq.org/zeromq-2.1.7.tar.gz tar xzf zeromq-2.1.7.tar.gz cd zeromq-2.1.7 ./configure make sudo make install
cd ~/src git clone https://github.com/nathanmarz/jzmq.git cd jzmq ./autogen.sh ./configure make sudo make install
cd ~/src wget https://github.com/downloads/nathanmarz/storm/storm-0.7.0.zip unzip storm-0.7.0.zip
We need to point storm to the correct zookeeper servers, as well as setup other config options. For now, we’ll just modify the
conf/storm.yaml file to look like this:
storm.local.dir: "/tmp/storm" storm.zookeeper.servers: - "zookeeper" nimbus.host: "localhost"
You’ll want to put some real values in there depending on the size of your cluster, and you’ll definitely want to change
/tmp/storm to something more persistent, but this sufices for a demo.
In order to run storm, you need both nimbus and supervisor processes running. We can run them with
nohup ~/src/storm-0.7.0/bin/storm nimbus & nohup ~/src/storm-0.7.0/bin/storm supervisor &
Now it should be ready to process topologies!
Running the Storm UI
Storm comes with a nifty dashboard to view cluster stats. To run it, just do:
nohup ~/src/storm-0.7.0/bin/storm ui &