Running The Hadoop Examples Wordcount
Introduction
This is the followup session notes, from the earlier session on Hadoop Wordcount program and notes.
In earlier session, we covered Why wordcount and the anatomy of it, Also, wrote the program in eclipse.
In this session we will
In this session we will
* How to Run wordcount program in eclipse
* How to Run wordcount program given in examples.jar in singlenode and multinode env
* Go through the hadoop urls to see the stats of the program
* Create HDFS folders and browse through HDFS from urls
Hadoop Urls
- localhost:50070 – url of name node
- localhost:50060 – url of task tracker
- localhost:50030 – url for job tracker
In Multinode env,
- :50070 – namenode
- :50030 – jobtracker
- :50060 – tast tracker
jobtracker schedules jobs
Hadoop Commands
- Create a data directory on hdfs
hadoop fs -mkdir hdfs://<url>:8020/user/mytest1
(to verify go to url:50070) - Copy data file form local to hdfs
hadoop fs -copyFromLocal /home/dataset hdfs://localhost:8020/Data1/
- Delete files from hdfs
hadoop fs -rmr hdfs://url:8020/user/mytest1
- Running the hadoop Jar#hadoop jar will prompt for input and output file which will be the files from hdfs.
$hadoop jar /home/itell/hadoop/1.2.1/hadoop-examples-1.2.0.jar/wordcount input file - /Data1 output - /Output (this is a dir)
Hadoop Admin Perspective
These are the admin properties when you go to namenode and jobtracker urls.
From developer perspective its good to know the properties.
From developer perspective its good to know the properties.
Job configuration page on gives all the properties for job configuration
? what happens is tasktracker fails
property name- mapred.map.mapx.attempts = 4
if a tasktracker has failed on a process a particular task it will retry upto 4 and then it will declare it as failed job.
Tasktracker fails not because of hardware but logic issues.
Tasktracker fails not because of hardware but logic issues.
Running Hadoop Examples.jar wordcount in SingleNode/Multinode
We will be running to examples wordcount program from hadoop examples.jar.
The programe we created can only be run in eclipse so far by giving the input and output folder arguments.
To run our program, we will have to create the program by hadoop standars, which will covered later
The programe we created can only be run in eclipse so far by giving the input and output folder arguments.
To run our program, we will have to create the program by hadoop standars, which will covered later
$ hadoop namenode -format
(Note - this is formatting all the datanotes in the network for hdfs)
$ start-dfs.sh
$ start-mapred.sh
(Revision - On Datenode processes are Datanode and Tasktracker)
Note: copy the book1.txt to a location from where you want to copy to hdfs.
We will use copyFromLocal, which will create dir in hdfs and also copy
e.g SOURCE = /home/<user>/projects/dataset
DESTN= hdfs://192.168.158.132:8020/wctest/dataset
$ hadoop fs -copyFromLocal SOURCE DESTN
$ hadoop jar /home/<user>/hadoop-1.2.1/hadoop-examples-1.2.1.jar wordcount hdfs://<namenodeip>:8020/wctest/dataset
hdfs://<namenodeip>:8020/wctest/output
All the IpAddrs are of the NamenodeIP Only
GOTCHAS -
Check your ipAddr on your VMS first, ensure they are the same. Else modify the following files for ipAddr
* /etc/hosts
* /hadoop-1.2.1/conf/masters
* /hadoop-1.2.1/conf/slaves
* /hadoop-1.2.1/conf/core-site.xml
* /hadoop-1.2.1/conf/mapred-site.xml
After doing this , DO NOT FORGET TO REBOOT THE MACHINE
Comments
Post a Comment