Sunday, October 15, 2017
Step by step Installing Apache Oozie in Hadoop single node
Prerequisites:
1. Java jdk 1.6+
2. Maven 3.3.9
3. Hadoop 2.x
Installing Oozie
4.3.0
Setting the environment variables:
Save and close the file
Running Oozie Build
Modify the pom.xml under /u01/hadoop/oozie-4.3.0 as below
Now run the below command
Oozie server setup
Add or modify hadoop core-site.xml under /u01/hadoop/hadoop-2.7.2/etc/hadoop
as below
Create a libext folder under the oozie directory
Move downloaded ext-2.2.zip to libext folder
Copy Hadoop libraries into libext folder of oozie
Preparing war file
Creating Oozie
Sharelib
Copy the hadoop core-site.xml file properties to core-site.xml
file in /u01/hadoop/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/hadoop-conf
Copy mapred-site.xml and yarn-site.xml files to /u01/hadoop/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/hadoop-conf
Add or modify oozie-site.xml file under /u01/hadoop/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf
as below
Now run the below command
Creating Oozie
database
Setting environment
variables for Oozie
Starting Oozie daemon
Run the below command to start Oozie daemon
Run the below command to run as a process
Setting up client
node for Oozie
Starting Oozie client node
Oozie web console url
Checking the status
of the Oozie process
Checking the what sharelib is being used by oozie while the
daemon is running
Running the example
Oozie jobs and testing the installation
We see different example folders such as
pig,hive,mapreduce etc..,
Taking mapreduce to explain below steps.
In the job.properties files of each folder,
change the name node port and job tracker port as below
save and close the file
Copying the examples folder to hdfs
Run the below command to check the status of the job
To
check the log run the below command
Monday, October 9, 2017
Step by step Apache Hive installation
Apache Hive
1. Hive is a data warehouse infrastructure tool to process
structured data in Hadoop
2. Hive resides on top of Hadoop to summarize Big Data, and
makes querying and analyzing easy.
3. It is a platform used to develop SQL type scripts to do
MapReduce operations.
4. Initially Hive was developed by Facebook, later the
Apache Software Foundation took it up and developed it further as an open
source under the name Apache Hive.
5. It is used by different companies. For example, Amazon
uses it in Amazon Elastic MapReduce.
Hive is not
- A relational database
- A design for OnLine Transaction Processing (OLTP)
- A language for real-time queries and row-level updates
Features of Hive
- It stores schema in a database and processed data into HDFS.
- It is designed for OLAP.
- It provides SQL type language for querying called HiveQL or HQL.
- It is familiar, fast, scalable, and extensible.
Installing Hive
Prerequisites
1. Java
2. Hadoop
3. Hive
Download and extract hive software as below
Setting environment
variables
Save and close the file
Configuring Hive
To configure Hive with Hadoop, you need to edit the hive-env.sh file,
which is placed in the $HIVE_HOME/conf directory. The
following commands redirect to Hive config folder and copy the
template file:
Save and close the file
Creating Hive
directories in hdfs
Configuring the Metastore
database
Creating Initial database schema using the hive-schema-0.14.0.mysql.sql and
upgrade-2.0.0-to-2.1.0.mysql.sql file
We also need a Mysql user account for Hive to use to access
the Metastore. It is very important to prevent this user account from creating
or altering tables in the Metastore database schema.
Create hive-site.xml ( If not already present) in $HIVE_HOME/conf
folder with the configuration below
Save and close the file
Starting hive
Starting hive Metastore
We are done with the configuration and setup,
now we can access Hive with Metastore in MySQL, but before that we have to
start Metastore process using following command.
We can confirm the running metastore process with jps
command. If you find RunJar in the list that means the Metastore process is
running.
Starting hive server
2
Subscribe to:
Posts (Atom)
Opatch reports 'Cyclic Dependency Detected' error when patching ODI Issue: When applying a Patch Set Update (PSU) to WebLogic Se...
-
Getting the below error when installing oracle database 12c on linux machine Soft Limit: maximum stack size - This is a prerequisite con...
-
After configuring SSL with Custom Identity and Trust Keystores and when we try to start the servers getting the below error ...
-
1. Creating a directory in hdfs $ hdfs dfs -mkdir <paths> 2. List the directories in hdfs $ hdfs dfs -l...