Hadoop Installation Overview – Part2 – Sudo Distribution
Hadoop installation Overview – Sudo Distribution
Overview
This is the part 2 of Hadoop installation document, in part1 we covered installing hadoop with NameNodes,DataNotes on a single VM enviroment.
In Sudo Distribution, we will Install 2 instances of VM, where one machine will act as a NameNode+DataNode where as other will act as a DataNode only.
In Sudo Distribution, we will Install 2 instances of VM, where one machine will act as a NameNode+DataNode where as other will act as a DataNode only.
Initial Setup
Goal – Create two VMs of Ubuntu 64 bit with each one having the hadoop setup as in part1
Steps
Step1 – Start with the VM you already have in part1
Step2 – There are two ways we can clone the VM
- Full Clone – Creates the entire copy of the VM. Best way to create the first VM(which will act as the Namenode)
- Linked Clone – Does not create the full copy, but builds on top. Best way to creat the second datanode
Step a. Create a FullClone of the VM we have from part1 and name it as the namenode. (to verify , open sudo gedit /etc/hostname )
Step b Crate a Linked Clone of the fullClone VM and name it as the datanode1.
Step b Crate a Linked Clone of the fullClone VM and name it as the datanode1.
Note – wea re created linked Clones only to save space.
Setup Overview
This distribution will have 2 different VMs, one will act as a NameNode+DataNode and other will act as a DataNode only.
Following are the Sample IpAddr for this reference installation.
Following are the Sample IpAddr for this reference installation.
Host Name | IpAddr |
---|---|
nameNode | 192.169.146.129 |
datanode1 | 192.169.146.130 |
Note Make sure the ipaddr of your VM has the 192. range, otherwise your namenodes and datanodes will not come up. I faced this issue and had to struggle to ensure the ipAddr is in the above range.
Installation Overview
Before we dive in detals, this is the Summary of steps
Step1 – Update the Conf Files
core-site.xml, hdfs-site.xml,mapred-site.xml,masters,slaves
Step2 – Generate SSH Keys on Name Node and Copy it to datanodes
Using ssh-keygen create new keys and then copy them over to the datanodes.
Step3 – Edit the hosts file
/etc/hosts file should have the ip and name of the respective machine.
Step4 – Make sure .bashrc file is good as per Part1
Step5 – Start hadoop Cluster from namenode
start-dfs.sh, start-mapred.sh
Step6 – Finally, Verify everything is working
Go to nameNode and verify 192.169.146.129:50070 is working. The ipaddress is of the namenode.
Installation Details
Step0 – Verify the hostnames file
/etc/hostname - should have entries for
nameNode1 # in the Namenode machine
dataNode1 # in the dataNode machine
Restar the machine if hostname is updated
Step1 – Update Conf Files On NameNode and dataNode1
core-site.xml
<property><name>default.name</name><property>192.168.146.129:8020</property>
Note ipAddr represents the namenodes ip address
mapred-site.xml
<property><name>mapred.job.tracker</name><property>192.168.146.129:8021</property>
hdfs-site.xml
<property><name>dfs.replicationr</name>
<value>2</value>
</property>
masters (Only on the NameNode)
192.168.146.129
slaves (on namenode and datanode)
On NameNode Machine
192.168.146.129
192.168.146.130
On DataNode Machine
192.168.146.130
- Master and Slaves are how you know who is the NameNode and DataNodes.
- Master file has all entries of all the ipaddres of all datanodes. if you add NameNode ipaddr then
/etc/hosts
all the machines will have entry with ip and name of the machine.
192.169.146.129 nameNode (on namenode Machine)
192.169.146.130 dataNode1 (on datanode Machine)
Summary Table
File | namenode | datanode1 |
---|---|---|
core-site.xml | ip of namenode | ip of namenode |
mapred-site.xml | ip of namenode | ip of namenode |
hdfs-site.xml | 2 | 2 |
masters | namenode ipaddr | na |
slaves | namenode ipaddr + datanode ip addr | ip of datanode |
Step2 – SSH Keygen for namenode- datanode commn
*Note – before starting the process, clean the ~/.ssh folders on both datanode and namenode machines
$ ssh-keygen
#copy all .pub file to
$ ssh-copy-id -i ~/.ssh/id_rsa.pub >> user1@192.168.146.129
$ ssh-copy-id ~/.ssh/id_rsa.pub >> user2@192.168.146.130
Step3 – On Namenode – format hadoop
hadoop namenode -format
Step4 – .bashrc
export HADOOP_PREFIX=~/softwares/hadoop-1.2.1
export PATH=$PATH:$HADOOP_PREFIX/bin
Step5 – Crank it all up, on NameNode M/c
start-dfs.sh
start-mapred.sh
Step6 – Verify if all is working
If you did all the steps due dilengently and did not make any minor mistakes, everything should fall in place.
$ jps #on each machine to see if the namenode,jobtracker,tasktrakcer are running
# if this is working then go to namenode and open firefox
and go to 192.169.146.129:50070
This should show the hadoop summary.
If you see the above page, yes, everything has been configured properly! HURRAY!!
publcihed in 2016
Comments
Post a Comment