Running The Hadoop Examples Wordcount

March 16, 2018

Introduction

This is the followup session notes, from the earlier session on Hadoop Wordcount program and notes.

In earlier session, we covered Why wordcount and the anatomy of it, Also, wrote the program in eclipse.
In this session we will

* How to Run wordcount program in eclipse
* How to Run wordcount program given in examples.jar in singlenode and multinode env
* Go through the hadoop urls to see the stats of the program
* Create HDFS folders and browse through HDFS from urls

Hadoop Urls

localhost:50070 – url of name node
localhost:50060 – url of task tracker
localhost:50030 – url for job tracker

In Multinode env,

:50070 – namenode
:50030 – jobtracker
:50060 – tast tracker

jobtracker schedules jobs

Hadoop Commands

Create a data directory on hdfs
```
  hadoop fs -mkdir hdfs://<url>:8020/user/mytest1
```
(to verify go to url:50070)

Copy data file form local to hdfs

  hadoop fs -copyFromLocal /home/dataset hdfs://localhost:8020/Data1/

Delete files from hdfs

  hadoop fs -rmr hdfs://url:8020/user/mytest1

Running the hadoop Jar

#hadoop jar will prompt for input and output file which will be the files from hdfs.

  $hadoop jar /home/itell/hadoop/1.2.1/hadoop-examples-1.2.0.jar/wordcount
   input file - /Data1
   output - /Output (this is a dir)

Hadoop Admin Perspective

These are the admin properties when you go to namenode and jobtracker urls.
From developer perspective its good to know the properties.

Job configuration page on gives all the properties for job configuration

? what happens is tasktracker fails

property name- mapred.map.mapx.attempts = 4

if a tasktracker has failed on a process a particular task it will retry upto 4 and then it will declare it as failed job.
Tasktracker fails not because of hardware but logic issues.

Running Hadoop Examples.jar wordcount in SingleNode/Multinode

We will be running to examples wordcount program from hadoop examples.jar.
The programe we created can only be run in eclipse so far by giving the input and output folder arguments.
To run our program, we will have to create the program by hadoop standars, which will covered later

$ hadoop namenode -format 
(Note - this is formatting all the datanotes in the network for hdfs)
$ start-dfs.sh
$ start-mapred.sh
(Revision - On Datenode processes are Datanode and Tasktracker)

Note: copy the  book1.txt to a location from where you want to copy to hdfs.
We will use copyFromLocal, which will create dir in hdfs and also copy
e.g SOURCE = /home/<user>/projects/dataset
DESTN= hdfs://192.168.158.132:8020/wctest/dataset
$ hadoop fs -copyFromLocal SOURCE DESTN
$ hadoop jar /home/<user>/hadoop-1.2.1/hadoop-examples-1.2.1.jar wordcount    hdfs://<namenodeip>:8020/wctest/dataset
hdfs://<namenodeip>:8020/wctest/output

All the IpAddrs are of the NamenodeIP Only

GOTCHAS -

Check your ipAddr on your VMS first, ensure they are the same. Else modify the following files for ipAddr

* /etc/hosts
* /hadoop-1.2.1/conf/masters
* /hadoop-1.2.1/conf/slaves
* /hadoop-1.2.1/conf/core-site.xml
* /hadoop-1.2.1/conf/mapred-site.xml

After doing this , DO NOT FORGET TO REBOOT THE MACHINE

Search This Blog

Web Development- Java/Rails/Grails/Groovy/Javascript/Angular ...