INSTALLATION DOCUMENTS BY RAVI

Wednesday, June 28, 2017

Step by step installing Apache hadoop single node environment on linux

Perquisites:

1. Java must be installed

2. ssh must be installed and ssh must be running to use the Hadoop scripts that manage remote Hadoop daemons


Download and stage the software:






Installing Java:

Make a directory named java under /u01/

Move the downloaded java software from /mnt to /u01/java

Extract the tar.gz file as below 
















Installing Hadoop:

Make a directory named hadoop under /u01/

Move the downloaded java software from /mnt to /u01/hadoop

Extract the tar.gz file as below 















Edit the file /u01/hadoop/hadoop-2.7.2/etc/hadoop/hadoop-env.sh to define some parameters as below


















Save and exit the file

Test the hadoop command as below
















Configuring Hadoop:

Modifying etc/hadoop/core-site.xml as below:



















Save and exit the file

Modifying etc/hadoop/hdfs-site.xml as below:



















Save and exit the file

Setup passphraseless ssh

Check that you can ssh to the localhost without a passphrase:






If you cannot ssh to localhost without a passphrase, execute the following commands:















Executing MapReduce job locally:

Format the file system:


















Start NameNode daemon and DataNode daemon:























The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs)

Browse the web interface for the NameNode; by default it is available at:
  • NameNode - http://localhost:50070/






















Create the HDFS directories required to execute MapReduce jobs:






















Copy the input files into the distributed file system:





Run some of the examples provided:



















Examine the output files:

Copy the output files from the distributed file system to the local file system and examine them:





























When you're done, stop the daemons with:
















YARN on Single Node

We can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running Resource Manager daemon and Node Manager daemon in addition.

Configure parameters as below:

Modifying etc/hadoop/mapred-site.xml as below:



















Save and exit the file

Modifying etc/hadoop/yarn-site.xml as below:





















Save and exit the file

Start Resource Manager daemon and Node Manager daemon:
















Browse the web interface for the Resource Manager by default it is available at:

  • Resource Manager - http://localhost:8088/


















Start NameNode daemon and DataNode daemon:















Run a MapReduce job






When you're done, stop the daemons with below command:


2 comments:


  1. Das Bundesamt für Sicherheit in der Informationstechnik (BSI) legt bei der Cybersecurity-Weiterbildung besonderen Wert auf die forensische Memory-Analyse. Dabei spielt die Erstellung von Timeline-Diagrammen eine zentrale Rolle, um Angriffswege nachzuvollziehen. Im Jahr 2023 hat sich gezeigt, dass viele Unternehmen bei der Analyse von RAM-Dumps oft wichtige Hinweise übersehen. Tools wie Volatility ermöglichen es, verdächtige Prozesse schnell zu identifizieren und den Takt eines Angreifers nachzuvollziehen. Für Fachkräfte ist das Verständnis solcher Verfahren unerlässlich, um im Ernstfall effektiv reagieren zu können, weshalb spezialisierte Weiterbildungen immer gefragter werden. Auf csvisor wird deutlich, wie professionelle Fortbildungen im Bereich DFIR die Reaktionszeiten deutlich verkürzen können.

    ReplyDelete

  Opatch reports 'Cyclic Dependency Detected' error when patching ODI   Issue: When applying a Patch Set Update (PSU) to WebLogic Se...