Hadoop Installation Overview – Part2 – Sudo Distribution

Hadoop installation Overview – Sudo Distribution

Overview

This is the part 2 of Hadoop installation document, in part1 we covered installing hadoop with NameNodes,DataNotes on a single VM enviroment.
In Sudo Distribution, we will Install 2 instances of VM, where one machine will act as a NameNode+DataNode where as other will act as a DataNode only.

Initial Setup

Goal – Create two VMs of Ubuntu 64 bit with each one having the hadoop setup as in part1
Steps
Step1 – Start with the VM you already have in part1
Step2 – There are two ways we can clone the VM
  • Full Clone – Creates the entire copy of the VM. Best way to create the first VM(which will act as the Namenode)
  • Linked Clone – Does not create the full copy, but builds on top. Best way to creat the second datanode
Step a. Create a FullClone of the VM we have from part1 and name it as the namenode. (to verify , open sudo gedit /etc/hostname )
Step b Crate a Linked Clone of the fullClone VM and name it as the datanode1.
Note – wea re created linked Clones only to save space.

Setup Overview

This distribution will have 2 different VMs, one will act as a NameNode+DataNode and other will act as a DataNode only.
Following are the Sample IpAddr for this reference installation.
Host NameIpAddr
nameNode192.169.146.129
datanode1192.169.146.130
Note Make sure the ipaddr of your VM has the 192. range, otherwise your namenodes and datanodes will not come up. I faced this issue and had to struggle to ensure the ipAddr is in the above range.

Installation Overview

Before we dive in detals, this is the Summary of steps
Step1 – Update the Conf Files
core-site.xml, hdfs-site.xml,mapred-site.xml,masters,slaves
Step2 – Generate SSH Keys on Name Node and Copy it to datanodes
Using ssh-keygen create new keys and then copy them over to the datanodes.
Step3 – Edit the hosts file
/etc/hosts file should have the ip and name of the respective machine.
Step4 – Make sure .bashrc file is good as per Part1
Step5 – Start hadoop Cluster from namenode
start-dfs.sh, start-mapred.sh
Step6 – Finally, Verify everything is working
Go to nameNode and verify 192.169.146.129:50070 is working. The ipaddress is of the namenode.

Installation Details

Step0 – Verify the hostnames file

/etc/hostname - should have entries for 
nameNode1  # in the Namenode machine
dataNode1 # in the dataNode machine
Restar the machine if hostname is updated

Step1 – Update Conf Files On NameNode and dataNode1

core-site.xml
<property><name>default.name</name><property>192.168.146.129:8020</property> 
Note ipAddr represents the namenodes ip address
mapred-site.xml
<property><name>mapred.job.tracker</name><property>192.168.146.129:8021</property>
hdfs-site.xml
<property><name>dfs.replicationr</name>
<value>2</value>
</property>
masters (Only on the NameNode)
192.168.146.129
slaves (on namenode and datanode)
On NameNode Machine
192.168.146.129
192.168.146.130
On DataNode Machine
192.168.146.130
  • Master and Slaves are how you know who is the NameNode and DataNodes.
  • Master file has all entries of all the ipaddres of all datanodes. if you add NameNode ipaddr then

/etc/hosts

all the machines will have entry with ip and name of the machine.
192.169.146.129  nameNode (on namenode Machine)
192.169.146.130  dataNode1 (on datanode Machine)
Summary Table
Filenamenodedatanode1
core-site.xmlip of namenodeip of namenode
mapred-site.xmlip of namenodeip of namenode
hdfs-site.xml22
mastersnamenode ipaddrna
slavesnamenode ipaddr + datanode ip addrip of datanode

Step2 – SSH Keygen for namenode- datanode commn

*Note – before starting the process, clean the ~/.ssh folders on both datanode and namenode machines
    $ ssh-keygen 
    #copy all .pub file to 
    $ ssh-copy-id -i ~/.ssh/id_rsa.pub >> user1@192.168.146.129
    $ ssh-copy-id ~/.ssh/id_rsa.pub >> user2@192.168.146.130

Step3 – On Namenode – format hadoop

     hadoop namenode -format

Step4 – .bashrc

    export HADOOP_PREFIX=~/softwares/hadoop-1.2.1
    export PATH=$PATH:$HADOOP_PREFIX/bin

Step5 – Crank it all up, on NameNode M/c

start-dfs.sh
start-mapred.sh     

Step6 – Verify if all is working

If you did all the steps due dilengently and did not make any minor mistakes, everything should fall in place.
$ jps #on each machine to see if the namenode,jobtracker,tasktrakcer are running
# if this is working then go to namenode and open firefox
and go to 192.169.146.129:50070
This should show the hadoop summary.

If you see the above page, yes, everything has been configured properly! HURRAY!!
 publcihed in 2016

Comments

Popular posts from this blog

Apache Airflow Wait Between Tasks

Java J2EE Security Considerations

Java Spring Interview Questions