Hadoop Installation Overview – Part2 – Sudo Distribution

March 16, 2018

Hadoop installation Overview – Sudo Distribution

Overview

This is the part 2 of Hadoop installation document, in part1 we covered installing hadoop with NameNodes,DataNotes on a single VM enviroment.
In Sudo Distribution, we will Install 2 instances of VM, where one machine will act as a NameNode+DataNode where as other will act as a DataNode only.

Initial Setup

Goal – Create two VMs of Ubuntu 64 bit with each one having the hadoop setup as in part1

Steps

Step1 – Start with the VM you already have in part1

Step2 – There are two ways we can clone the VM

Full Clone – Creates the entire copy of the VM. Best way to create the first VM(which will act as the Namenode)
Linked Clone – Does not create the full copy, but builds on top. Best way to creat the second datanode

Step a. Create a FullClone of the VM we have from part1 and name it as the namenode. (to verify , open sudo gedit /etc/hostname )
Step b Crate a Linked Clone of the fullClone VM and name it as the datanode1.

Note – wea re created linked Clones only to save space.

Setup Overview

This distribution will have 2 different VMs, one will act as a NameNode+DataNode and other will act as a DataNode only.
Following are the Sample IpAddr for this reference installation.

Host Name	IpAddr
nameNode	192.169.146.129
datanode1	192.169.146.130

Note Make sure the ipaddr of your VM has the 192. range, otherwise your namenodes and datanodes will not come up. I faced this issue and had to struggle to ensure the ipAddr is in the above range.

Installation Overview

Before we dive in detals, this is the Summary of steps

Step1 – Update the Conf Files

core-site.xml, hdfs-site.xml,mapred-site.xml,masters,slaves

Step2 – Generate SSH Keys on Name Node and Copy it to datanodes

Using ssh-keygen create new keys and then copy them over to the datanodes.

Step3 – Edit the hosts file

/etc/hosts file should have the ip and name of the respective machine.

Step4 – Make sure .bashrc file is good as per Part1

Step5 – Start hadoop Cluster from namenode

start-dfs.sh, start-mapred.sh

Step6 – Finally, Verify everything is working

Go to nameNode and verify 192.169.146.129:50070 is working. The ipaddress is of the namenode.

Installation Details

Step0 – Verify the hostnames file

/etc/hostname - should have entries for 
nameNode1  # in the Namenode machine
dataNode1 # in the dataNode machine

Restar the machine if hostname is updated

Step1 – Update Conf Files On NameNode and dataNode1

core-site.xml

<property><name>default.name</name><property>192.168.146.129:8020</property>

Note ipAddr represents the namenodes ip address

mapred-site.xml

<property><name>mapred.job.tracker</name><property>192.168.146.129:8021</property>

hdfs-site.xml

<property><name>dfs.replicationr</name>
<value>2</value>
</property>

masters (Only on the NameNode)

192.168.146.129

slaves (on namenode and datanode)

On NameNode Machine
192.168.146.129
192.168.146.130
On DataNode Machine
192.168.146.130

Master and Slaves are how you know who is the NameNode and DataNodes.
Master file has all entries of all the ipaddres of all datanodes. if you add NameNode ipaddr then

/etc/hosts

all the machines will have entry with ip and name of the machine.

192.169.146.129  nameNode (on namenode Machine)
192.169.146.130  dataNode1 (on datanode Machine)

Summary Table

File	namenode	datanode1
core-site.xml	ip of namenode	ip of namenode
mapred-site.xml	ip of namenode	ip of namenode
hdfs-site.xml	2	2
masters	namenode ipaddr	na
slaves	namenode ipaddr + datanode ip addr	ip of datanode

Step2 – SSH Keygen for namenode- datanode commn

*Note – before starting the process, clean the ~/.ssh folders on both datanode and namenode machines

    $ ssh-keygen 
    #copy all .pub file to 
    $ ssh-copy-id -i ~/.ssh/id_rsa.pub >> user1@192.168.146.129
    $ ssh-copy-id ~/.ssh/id_rsa.pub >> user2@192.168.146.130

Step3 – On Namenode – format hadoop

     hadoop namenode -format

Step4 – .bashrc

    export HADOOP_PREFIX=~/softwares/hadoop-1.2.1
    export PATH=$PATH:$HADOOP_PREFIX/bin

Step5 – Crank it all up, on NameNode M/c

start-dfs.sh
start-mapred.sh

Step6 – Verify if all is working

If you did all the steps due dilengently and did not make any minor mistakes, everything should fall in place.

$ jps #on each machine to see if the namenode,jobtracker,tasktrakcer are running
# if this is working then go to namenode and open firefox
and go to 192.169.146.129:50070
This should show the hadoop summary.

If you see the above page, yes, everything has been configured properly! HURRAY!!

publcihed in 2016

Search This Blog

Web Development- Java/Rails/Grails/Groovy/Javascript/Angular ...