Info Imply@Ravi: Step by step installing Apache hadoop single node environment on linux

Wednesday, June 28, 2017

Step by step installing Apache hadoop single node environment on linux

Perquisites:

1. Java must be installed

2. ssh must be installed and ssh must be running to use the Hadoop scripts that manage remote Hadoop daemons

Download and stage the software:

Installing Java:

Make a directory named java under /u01/

Move the downloaded java software from /mnt to /u01/java

Extract the tar.gz file as below

Installing Hadoop:

Make a directory named hadoop under /u01/

Move the downloaded java software from /mnt to /u01/hadoop

Extract the tar.gz file as below

Edit the file /u01/hadoop/hadoop-2.7.2/etc/hadoop/hadoop-env.sh to define some parameters as below

Save and exit the file

Test the hadoop command as below

Configuring Hadoop:

Modifying etc/hadoop/core-site.xml as below:

Save and exit the file

Modifying etc/hadoop/hdfs-site.xml as below:

Save and exit the file

Setup passphraseless ssh

Check that you can ssh to the localhost without a passphrase:

If you cannot ssh to localhost without a passphrase, execute the following commands:

Executing MapReduce job locally:

Format the file system:

Start NameNode daemon and DataNode daemon:

The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs)

Browse the web interface for the NameNode; by default it is available at:

NameNode - http://localhost:50070/

Create the HDFS directories required to execute MapReduce jobs:

Copy the input files into the distributed file system:

Run some of the examples provided:

Examine the output files:

Copy the output files from the distributed file system to the local file system and examine them:

When you're done, stop the daemons with:

YARN on Single Node

We can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running Resource Manager daemon and Node Manager daemon in addition.

Configure parameters as below:

Modifying etc/hadoop/mapred-site.xml as below:

Save and exit the file

Modifying etc/hadoop/yarn-site.xml as below:

Save and exit the file

Start Resource Manager daemon and Node Manager daemon:

Browse the web interface for the Resource Manager by default it is available at:

Resource Manager - http://localhost:8088/

Start NameNode daemon and DataNode daemon:

Run a MapReduce job

When you're done, stop the daemons with below command:

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)