INSTALLATION DOCUMENTS BY RAVI

Sunday, October 15, 2017

Spark Streaming and Kafka

Prerequisites:

1. Java






2. Start zookeeper server








3. Start kafka and create a new topic





Creating new topic












Test communication between Kafka and Spark

Run Kafka producer in a new screen














Run Spark’s KafkaWordCount in a new screen




































Step by step Installing Apache Oozie in Hadoop single node

Prerequisites:


1. Java jdk 1.6+





2. Maven 3.3.9








3. Hadoop 2.x







Installing Oozie 4.3.0
















Setting the environment variables:




















Save and close the file

Running Oozie Build
Modify the pom.xml under /u01/hadoop/oozie-4.3.0 as below
















Now run the below command























Oozie server setup
Add or modify hadoop core-site.xml under /u01/hadoop/hadoop-2.7.2/etc/hadoop as below












Create a libext folder under the oozie directory





Move downloaded ext-2.2.zip to libext folder





Copy Hadoop libraries into libext folder of oozie

























Preparing war file


















Creating Oozie Sharelib

Copy the hadoop core-site.xml file properties to core-site.xml file in /u01/hadoop/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/hadoop-conf 






















Copy mapred-site.xml and yarn-site.xml files to /u01/hadoop/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/hadoop-conf








Add or modify oozie-site.xml file under /u01/hadoop/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf as below























Now run the below command

































Creating Oozie database















Setting environment variables for Oozie




















Starting Oozie daemon
Run the below command to start Oozie daemon



















Run the below command to run as a process



Setting up client node for Oozie







Starting Oozie client node




Oozie web console url

















Checking the status of the Oozie process














Checking the what sharelib is being used by oozie while the daemon is running









Running the example Oozie jobs and testing the installation













We see different example folders such as pig,hive,mapreduce etc..,
Taking mapreduce to explain below steps. 
In the job.properties files of each folder, change the name node port and job tracker port as below



















save and close the file
Copying the examples folder to hdfs







Run the below command to check the status of the job






















To check the log run the below command






















Informatica client tool certification matrix

Please find the informatica client tool certification matrix below:






Informatica supported browsers list

Please find the list of browsers supported by informatica below:


























Monday, October 9, 2017

Step by step Apache Hive installation

Apache Hive
1. Hive is a data warehouse infrastructure tool to process structured data in Hadoop
2. Hive resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
3. It is a platform used to develop SQL type scripts to do MapReduce operations.
4. Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive.
5. It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce.
Hive is not
  • A relational database
  • A design for OnLine Transaction Processing (OLTP)
  • A language for real-time queries and row-level updates
Features of Hive
  • It stores schema in a database and processed data into HDFS.
  • It is designed for OLAP.
  • It provides SQL type language for querying called HiveQL or HQL.
  • It is familiar, fast, scalable, and extensible.
Installing Hive
Prerequisites


1. Java





2. Hadoop









3. Hive
Download and extract hive software as below












Setting environment variables



















Save and close the file




Configuring Hive
To configure Hive with Hadoop, you need to edit the hive-env.sh file, which is placed in the $HIVE_HOME/conf directory. The following commands redirect to Hive config folder and copy the template file:




















Save and close the file



















Creating Hive directories in hdfs








Configuring the Metastore database
Creating Initial database schema using the hive-schema-0.14.0.mysql.sql and upgrade-2.0.0-to-2.1.0.mysql.sql file






























We also need a Mysql user account for Hive to use to access the Metastore. It is very important to prevent this user account from creating or altering tables in the Metastore database schema.








Create hive-site.xml ( If not already present) in $HIVE_HOME/conf folder with the configuration below






















Save and close the file

Starting hive






























Starting hive Metastore
We are done with the configuration and setup, now we can access Hive with Metastore in MySQL, but before that we have to start Metastore process using following command.


















We can confirm the running metastore process with jps command. If you find RunJar in the list that means the Metastore process is running.













Starting hive server 2


















  Opatch reports 'Cyclic Dependency Detected' error when patching ODI   Issue: When applying a Patch Set Update (PSU) to WebLogic Se...