Hadoop Installation Overview – Part2 – Sudo Distribution
Hadoop installation Overview – Sudo Distribution
This is the part 2 of Hadoop installation document, in part1 we covered installing hadoop with NameNodes,DataNotes on a single VM enviroment.
In Sudo Distribution, we will Install 2 instances of VM, where one machine will act as a NameNode+DataNode where as other will act as a DataNode only.
In Sudo Distribution, we will Install 2 instances of VM, where one machine will act as a NameNode+DataNode where as other will act as a DataNode only.
Initial Setup
Goal – Create two VMs of Ubuntu 64 bit with each one having the hadoop setup as in part1
Step1 – Start with the VM you already have in part1
Step2 – There are two ways we can clone the VM
- Full Clone – Creates the entire copy of the VM. Best way to create the first VM(which will act as the Namenode)
- Linked Clone – Does not create the full copy, but builds on top. Best way to creat the second datanode
Step a. Create a FullClone of the VM we have from part1 and name it as the namenode. (to verify , open sudo gedit /etc/hostname )
Step b Crate a Linked Clone of the fullClone VM and name it as the datanode1.
Step b Crate a Linked Clone of the fullClone VM and name it as the datanode1.
Note – wea re created linked Clones only to save space.
Setup Overview
This distribution will have 2 different VMs, one will act as a NameNode+DataNode and other will act as a DataNode only.
Following are the Sample IpAddr for this reference installation.
Following are the Sample IpAddr for this reference installation.
Host Name | IpAddr |
nameNode | |
datanode1 | |
Note Make sure the ipaddr of your VM has the 192. range, otherwise your namenodes and datanodes will not come up. I faced this issue and had to struggle to ensure the ipAddr is in the above range.
Installation Overview
Before we dive in detals, this is the Summary of steps
Step1 – Update the Conf Files
core-site.xml, hdfs-site.xml,mapred-site.xml,masters,slaves
Step2 – Generate SSH Keys on Name Node and Copy it to datanodes
Using ssh-keygen create new keys and then copy them over to the datanodes.
Step3 – Edit the hosts file
/etc/hosts file should have the ip and name of the respective machine.
Step4 – Make sure .bashrc file is good as per Part1
Step5 – Start hadoop Cluster from namenode
start-dfs.sh, start-mapred.sh
Step6 – Finally, Verify everything is working
Go to nameNode and verify is working. The ipaddress is of the namenode.
Installation Details
Step0 – Verify the hostnames file
/etc/hostname - should have entries for
nameNode1 # in the Namenode machine
dataNode1 # in the dataNode machine
Restar the machine if hostname is updated
Step1 – Update Conf Files On NameNode and dataNode1
Note ipAddr represents the namenodes ip address
masters (Only on the NameNode)
slaves (on namenode and datanode)
On NameNode Machine
On DataNode Machine
- Master and Slaves are how you know who is the NameNode and DataNodes.
- Master file has all entries of all the ipaddres of all datanodes. if you add NameNode ipaddr then
all the machines will have entry with ip and name of the machine. nameNode (on namenode Machine) dataNode1 (on datanode Machine)
Summary Table
File | namenode | datanode1 |
core-site.xml | ip of namenode | ip of namenode |
mapred-site.xml | ip of namenode | ip of namenode |
hdfs-site.xml | 2 | 2 |
masters | namenode ipaddr | na |
slaves | namenode ipaddr + datanode ip addr | ip of datanode |
Step2 – SSH Keygen for namenode- datanode commn
*Note – before starting the process, clean the ~/.ssh folders on both datanode and namenode machines
$ ssh-keygen
#copy all .pub file to
$ ssh-copy-id -i ~/.ssh/id_rsa.pub >> user1@
$ ssh-copy-id ~/.ssh/id_rsa.pub >> user2@
Step3 – On Namenode – format hadoop
hadoop namenode -format
Step4 – .bashrc
export HADOOP_PREFIX=~/softwares/hadoop-1.2.1
Step5 – Crank it all up, on NameNode M/c
Step6 – Verify if all is working
If you did all the steps due dilengently and did not make any minor mistakes, everything should fall in place.
$ jps #on each machine to see if the namenode,jobtracker,tasktrakcer are running
# if this is working then go to namenode and open firefox
and go to
This should show the hadoop summary.
If you see the above page, yes, everything has been configured properly! HURRAY!!
publcihed in 2016
Post a Comment