Monday, 9 February 2015

How to run HADOOP MapReduce job?

Assuming HADOOP 2.6 full cluster has been properly set up, here is the steps to run the demo job:

# prepare jar file

vi WordCount.java
export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar
hadoop com.sun.tools.javac.Main WordCount.java
jar cf wc.jar WordCount*.class

# prepare data files

vi file01
vi file02

# format HDFS namenode

hdfs namenode -format

# start HDFS & MapReduce  daemons

start-dfs.sh
start-yarn.sh

# copy data into hdfs

hdfs dfs -mkdir -p /user/hdpuser/wordcount/input
hdfs dfs -copyFromLocal file* /user/hdpuser/wordcount/input

# calculating

hadoop jar wc.jar WordCount /user/hdpuser/wordcount/input /user/hdpuser/wordcount/output

# copy result out of hdfs

hdfs dfs -copyToLocal /user/hdpuser/wordcount/output/part-r-00000 result