Wednesday, June 28, 2017

Step by step installing Apache hadoop single node environment on linux


1. Java must be installed

2. ssh must be installed and ssh must be running to use the Hadoop scripts that manage remote Hadoop daemons

Download and stage the software:

Installing Java:

Make a directory named java under /u01/

Move the downloaded java software from /mnt to /u01/java

Extract the tar.gz file as below 

Installing Hadoop:

Make a directory named hadoop under /u01/

Move the downloaded java software from /mnt to /u01/hadoop

Extract the tar.gz file as below 

Edit the file /u01/hadoop/hadoop-2.7.2/etc/hadoop/ to define some parameters as below

Save and exit the file

Test the hadoop command as below

Configuring Hadoop:

Modifying etc/hadoop/core-site.xml as below:

Save and exit the file

Modifying etc/hadoop/hdfs-site.xml as below:

Save and exit the file

Setup passphraseless ssh

Check that you can ssh to the localhost without a passphrase:

If you cannot ssh to localhost without a passphrase, execute the following commands:

Executing MapReduce job locally:

Format the file system:

Start NameNode daemon and DataNode daemon:

The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs)

Browse the web interface for the NameNode; by default it is available at:
  • NameNode - http://localhost:50070/

Create the HDFS directories required to execute MapReduce jobs:

Copy the input files into the distributed file system:

Run some of the examples provided:

Examine the output files:

Copy the output files from the distributed file system to the local file system and examine them:

When you're done, stop the daemons with:

YARN on Single Node

We can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running Resource Manager daemon and Node Manager daemon in addition.

Configure parameters as below:

Modifying etc/hadoop/mapred-site.xml as below:

Save and exit the file

Modifying etc/hadoop/yarn-site.xml as below:

Save and exit the file

Start Resource Manager daemon and Node Manager daemon:

Browse the web interface for the Resource Manager by default it is available at:

  • Resource Manager - http://localhost:8088/

Start NameNode daemon and DataNode daemon:

Run a MapReduce job

When you're done, stop the daemons with below command:

No comments:

Post a Comment

  Opatch reports 'Cyclic Dependency Detected' error when patching ODI   Issue: When applying a Patch Set Update (PSU) to WebLogic Se...