Hadoop Installation Overview – Part1 – Single Node

Hadoop Installation Overview – Single Node Part1

Overview

This is the part1 of Hadoop installation document, We will be covering installing hadoop on a Single VM enviroment.
Both namenode and datanode will run on the same VM.

Assumption

This document assumes that

Installation Steps

Step1

Install VMWare Workstation, as mentioned above

Step2

Install ubuntu , as mentioned above.
**Note – Make sure you select the 32bit or 64bit version based on your windows version

Step3

Install OS Updates for ubuntu.
Go to Terminal and type
$ sudo apt-get update

Step4

Install OpenJDK6. Since hadoop is all java we need the JDK.
$apt-get install openjdk-6-jdk

Step5

Install Eclipse.
From UI go to software center and install eclipse

Step6

Open SSH Server
sudo apt-get install openssh-server

Step7

download hadoop 1.2.1.tar.gz

Step 8

Go to terminal and 

    $ cp hadoop 1.2.1.tar.gz ~ #copy to home dir 
    $ tar-xvf hadoop 1.2.1.tar.gz

Step9

#Open .bashrc and make the following settings
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64 #checkout the home is pointing to jdk
export HADOOP_PREFIX=/softwares/hadoop-1.2.1
export PATH = $PATH:$HADOOP_PREFIX/bin;

Step10 – CONFIGURATION FILES

Configuration Settings, go to conf folder of your hadoop installation
  • core-site.xml
      $sudo gedit core-site.xml
      <property>
          <name>fs-default.name</name>
          <value>localhost:8020</name>
      </property>
    
  • hdfs-site.xml
      $sudo gedit hdfs-site.xml
      <property>
          <name>dfs.replication</name>
          <value>1</value>
      </property>
      <property>
          <name>dfs.perimissions</name>
          <value>false</value>
      </property>
    
  • mapred-site.xml
      $sudo gedit mapred-site.xml
      <property>
          <name>mapred.job.tracker</name>
          <value>localhost:8021</value>
      </property>
    
  • /etc/hosts
    make the entry of the machines ipaddress in hosts file
    $ifconfig
    $sudo gedit /etc/hosts
    192.168.1.129 mynode1
    192.168.1.129 localhost

Step11 – FINAL STEP

Generate SSH Keys and pass public key to datanode.
    $ ssh-keygen
    $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    #note this step is needed, since we have datanote and namenode on the same machine,  namenode communcates with the datanote and for that on a single node installation authorized_keys is needed.

Step12 – RUN HADOOP AND VERIFY

With Step11, Installation is complete, lets run the hadoop
    $ hadoop namenode -format 
    #Note - This should format the hdfs partition. you should see successfully formatted message
    $ start-dfs.sh 
    #This will start the namenode and secondary namenode daemon
    $ start-mapred.sh
    #This will start the jobtracker and tasktracker daemon
    $ jps 
    #verify all the daemons are running, you see 5 process running, namenode, secondary namenode, jobtracker,tasktracker 


    YOUR HADOOP IS RUNNING NOW, If you see any challenges post in the comments below.

Comments

Popular posts from this blog

Apache Airflow Wait Between Tasks

Java J2EE Security Considerations

Java Spring Interview Questions