Apache Hive
1. Hive is a data warehouse infrastructure tool to process
structured data in Hadoop
2. Hive resides on top of Hadoop to summarize Big Data, and
makes querying and analyzing easy.
3. It is a platform used to develop SQL type scripts to do
MapReduce operations.
4. Initially Hive was developed by Facebook, later the
Apache Software Foundation took it up and developed it further as an open
source under the name Apache Hive.
5. It is used by different companies. For example, Amazon
uses it in Amazon Elastic MapReduce.
Hive is not
- A relational database
- A design for OnLine Transaction Processing (OLTP)
- A language for real-time queries and row-level updates
Features of Hive
- It stores schema in a database and processed data into HDFS.
- It is designed for OLAP.
- It provides SQL type language for querying called HiveQL or HQL.
- It is familiar, fast, scalable, and extensible.
Installing Hive
Prerequisites
1. Java
2. Hadoop
3. Hive
Download and extract hive software as below
Setting environment
variables
Save and close the file
Configuring Hive
To configure Hive with Hadoop, you need to edit the hive-env.sh file,
which is placed in the $HIVE_HOME/conf directory. The
following commands redirect to Hive config folder and copy the
template file:
Save and close the file
Creating Hive
directories in hdfs
Configuring the Metastore
database
Creating Initial database schema using the hive-schema-0.14.0.mysql.sql and
upgrade-2.0.0-to-2.1.0.mysql.sql file
We also need a Mysql user account for Hive to use to access
the Metastore. It is very important to prevent this user account from creating
or altering tables in the Metastore database schema.
Create hive-site.xml ( If not already present) in $HIVE_HOME/conf
folder with the configuration below
Save and close the file
Starting hive
Starting hive Metastore
We are done with the configuration and setup,
now we can access Hive with Metastore in MySQL, but before that we have to
start Metastore process using following command.
We can confirm the running metastore process with jps
command. If you find RunJar in the list that means the Metastore process is
running.
Starting hive server
2
No comments:
Post a Comment