Info Imply@Ravi

Sunday, October 15, 2017

Spark Streaming and Kafka

Prerequisites:

1. Java

2. Start zookeeper server

3. Start kafka and create a new topic

Creating new topic

Test communication between Kafka and Spark

Run Kafka producer in a new screen

Run Spark’s KafkaWordCount in a new screen

Step by step Installing Apache Oozie in Hadoop single node

Prerequisites:

1. Java jdk 1.6+

2. Maven 3.3.9

3. Hadoop 2.x

Installing Oozie 4.3.0

Setting the environment variables:

Save and close the file

Running Oozie Build

Modify the pom.xml under /u01/hadoop/oozie-4.3.0 as below

Now run the below command

Oozie server setup

Add or modify hadoop core-site.xml under /u01/hadoop/hadoop-2.7.2/etc/hadoop as below

Create a libext folder under the oozie directory

Move downloaded ext-2.2.zip to libext folder

Copy Hadoop libraries into libext folder of oozie

Preparing war file

Creating Oozie Sharelib

Copy the hadoop core-site.xml file properties to core-site.xml file in /u01/hadoop/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/hadoop-conf

Copy mapred-site.xml and yarn-site.xml files to /u01/hadoop/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/hadoop-conf

Add or modify oozie-site.xml file under /u01/hadoop/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf as below

Now run the below command

Creating Oozie database

Setting environment variables for Oozie

Starting Oozie daemon

Run the below command to start Oozie daemon

Run the below command to run as a process

Setting up client node for Oozie

Starting Oozie client node

Oozie web console url

http://localhost:11000/oozie

Checking the status of the Oozie process

Checking the what sharelib is being used by oozie while the daemon is running

Running the example Oozie jobs and testing the installation

We see different example folders such as pig,hive,mapreduce etc..,

Taking mapreduce to explain below steps.

In the job.properties files of each folder, change the name node port and job tracker port as below

save and close the file

Copying the examples folder to hdfs

Run the below command to check the status of the job

To check the log run the below command

Informatica client tool certification matrix

Please find the informatica client tool certification matrix below:

Informatica supported browsers list

Please find the list of browsers supported by informatica below:

Monday, October 9, 2017

Step by step Apache Hive installation

Apache Hive

1. Hive is a data warehouse infrastructure tool to process structured data in Hadoop

2. Hive resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.

3. It is a platform used to develop SQL type scripts to do MapReduce operations.

4. Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive.

5. It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce.

Hive is not

A relational database
A design for OnLine Transaction Processing (OLTP)
A language for real-time queries and row-level updates

Features of Hive

It stores schema in a database and processed data into HDFS.
It is designed for OLAP.
It provides SQL type language for querying called HiveQL or HQL.
It is familiar, fast, scalable, and extensible.

Installing Hive

Prerequisites

1. Java

2. Hadoop

3. Hive

Download and extract hive software as below

Setting environment variables

Save and close the file

Configuring Hive

To configure Hive with Hadoop, you need to edit the hive-env.sh file, which is placed in the $HIVE_HOME/conf directory. The following commands redirect to Hive config folder and copy the template file:

Save and close the file

Creating Hive directories in hdfs

Configuring the Metastore database

Creating Initial database schema using the hive-schema-0.14.0.mysql.sql and upgrade-2.0.0-to-2.1.0.mysql.sql file

We also need a Mysql user account for Hive to use to access the Metastore. It is very important to prevent this user account from creating or altering tables in the Metastore database schema.

Create hive-site.xml ( If not already present) in $HIVE_HOME/conf folder with the configuration below

Save and close the file

Starting hive

Starting hive Metastore

We are done with the configuration and setup, now we can access Hive with Metastore in MySQL, but before that we have to start Metastore process using following command.