Step by step Apache Hive installation

Apache Hive
1. Hive is a data warehouse infrastructure tool to process structured data in Hadoop
2. Hive resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
3. It is a platform used to develop SQL type scripts to do MapReduce operations.
4. Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive.
5. It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce.
Hive is not
  • A relational database
  • A design for OnLine Transaction Processing (OLTP)
  • A language for real-time queries and row-level updates
Features of Hive
  • It stores schema in a database and processed data into HDFS.
  • It is designed for OLAP.
  • It provides SQL type language for querying called HiveQL or HQL.
  • It is familiar, fast, scalable, and extensible.
Installing Hive

1. Java

2. Hadoop

3. Hive
Download and extract hive software as below

Setting environment variables

Save and close the file

Configuring Hive
To configure Hive with Hadoop, you need to edit the file, which is placed in the $HIVE_HOME/conf directory. The following commands redirect to Hive config folder and copy the template file:

Save and close the file

Creating Hive directories in hdfs

Configuring the Metastore database
Creating Initial database schema using the hive-schema-0.14.0.mysql.sql and upgrade-2.0.0-to-2.1.0.mysql.sql file

We also need a Mysql user account for Hive to use to access the Metastore. It is very important to prevent this user account from creating or altering tables in the Metastore database schema.

Create hive-site.xml ( If not already present) in $HIVE_HOME/conf folder with the configuration below

Save and close the file

Starting hive

Starting hive Metastore
We are done with the configuration and setup, now we can access Hive with Metastore in MySQL, but before that we have to start Metastore process using following command.

We can confirm the running metastore process with jps command. If you find RunJar in the list that means the Metastore process is running.

Starting hive server 2

