INSTALLATION DOCUMENTS BY RAVI

Wednesday, June 28, 2017

Step by step installing Apache hadoop single node environment on linux

Perquisites:

1. Java must be installed

2. ssh must be installed and ssh must be running to use the Hadoop scripts that manage remote Hadoop daemons


Download and stage the software:






Installing Java:

Make a directory named java under /u01/

Move the downloaded java software from /mnt to /u01/java

Extract the tar.gz file as below 
















Installing Hadoop:

Make a directory named hadoop under /u01/

Move the downloaded java software from /mnt to /u01/hadoop

Extract the tar.gz file as below 















Edit the file /u01/hadoop/hadoop-2.7.2/etc/hadoop/hadoop-env.sh to define some parameters as below


















Save and exit the file

Test the hadoop command as below
















Configuring Hadoop:

Modifying etc/hadoop/core-site.xml as below:



















Save and exit the file

Modifying etc/hadoop/hdfs-site.xml as below:



















Save and exit the file

Setup passphraseless ssh

Check that you can ssh to the localhost without a passphrase:






If you cannot ssh to localhost without a passphrase, execute the following commands:















Executing MapReduce job locally:

Format the file system:


















Start NameNode daemon and DataNode daemon:























The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs)

Browse the web interface for the NameNode; by default it is available at:
  • NameNode - http://localhost:50070/






















Create the HDFS directories required to execute MapReduce jobs:






















Copy the input files into the distributed file system:





Run some of the examples provided:



















Examine the output files:

Copy the output files from the distributed file system to the local file system and examine them:





























When you're done, stop the daemons with:
















YARN on Single Node

We can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running Resource Manager daemon and Node Manager daemon in addition.

Configure parameters as below:

Modifying etc/hadoop/mapred-site.xml as below:



















Save and exit the file

Modifying etc/hadoop/yarn-site.xml as below:





















Save and exit the file

Start Resource Manager daemon and Node Manager daemon:
















Browse the web interface for the Resource Manager by default it is available at:

  • Resource Manager - http://localhost:8088/


















Start NameNode daemon and DataNode daemon:















Run a MapReduce job






When you're done, stop the daemons with below command:


No comments:

Post a Comment

Loading xml file data to oracle table using python

Sample xml file (test.xml): <?xml version="1.0"?> <data>     <customer name="Ravi" >     ...