Post 28 | HDPCD | Write and Execute a Hive Query

Hello everyone, Welcome to the first tutorial of the DATA ANALYSIS section of the HDPCD certification. This section is going to contain a total of 24 posts, after which we will be finally done with the HDPCD certification tutorials. In the last tutorial of the DATA TRANSFORMATION section, we saw the process of invoking a UDF present in the jar file.

From this tutorial onward, we are going to work with Apache HIVE. If you have jumped to this tutorial directly, then you can install the Hortonworks Sandbox by using this tutorial and then continue with this and the rest of the tutorials in this certification series.

Let us get started with this tutorial, then.

The following infographics show the number of ways in which we can execute a hive query.

Type of executing a Hive Query
Type of executing a Hive Query

As you can see from the above picture, we can do hive query execution in three ways.

  • Logging into Hive session and executing hive query
  • Using -e option to execute the hive query
  • Creating a .sql file and then running this file using -f option

Now, before exploring these three ways to run a hive query, let us look for a table which we are going to use to test our hive query.

First, we need to log into hive session with the help of “hive” command and then use the following command to see the list of Hive tables.

hive

hive> show tables;

The output of the above commands looks something like this.

Step 1: List of Hive Tables
Step 1: List of Hive Tables

As you can see from the above screenshot, we are going to use the hive table called “categories” for running a simple select command. This select command looks as follows, for your reference.

select * from categories;

Let us explore each of these options in details as follows.

  • LOGGING INTO HIVE SESSION AND EXECUTING HIVE QUERY

You can log into hive session by running “hive” command on the terminal window and then run the select command as already mentioned above.

The output screenshot of this operation is as follows.

Step 2: Executing Hive command in Hive Terminal Window
Step 2: Executing Hive command in Hive Terminal Window

As you can see from the above picture that the command successfully executed and we got the records stored in the categories table. We will see in future how to store the records and everything, this tutorial is meant to get you familiar with the hive query execution process.

Now, let us see how to run the same query without logging into hive session and by using -e flag.

  • USING -e OPTION TO EXECUTE THE HIVE QUERY

With the help of -e option, we tell Hive to log in remotely and execute the command inside the single or double quotations. It works the same way, but we do not need to log into hive manually, that’s the only difference.

You can run the following command from the terminal window to perform this activity.

hive -e ‘select * from categories;’

You can check out the following screenshot for the command execution.

Step 3: Executing Hive command using -e flag
Step 3: Executing Hive command using -e flag

As you can see from the above screenshot, the select command executed successfully and we got the expected output.

Now, let us look at the last option to execute the hive query.

  • CREATING .sql FILE AND RUNNING IT WITH -f OPTION

We use this option when we want to perform more than one hive query in one shot. You can write as many hive queries as you want in a .sql file separated by a semicolon (;) and then run this file. For the demonstration purpose, we are going to run a single hive query using this file approach.

I have uploaded this file to my GitHub repository under HDPCD repository with name 39_hive_query.sql“. You can download this file by clicking here and the file looks as follows.

You can use the following commands to create the post28.sql file.

vi post28.sql

###############################

PASTE THE COPIED CONTENTS HERE

###############################

cat post28.sql

Once the file is created successfully, you can use the following command to run this hive query stored in the post28.sql file.

hive -f post28.sql

The output of the above commands looks something like this.

Step 4: Executing Hive Command using .sql file
Step 4: Executing Hive Command using .sql file

From the screenshot above, you can see that the command executed successfully and the hive command gave the expected output like the previous two cases.

This completes the tutorial in which we saw three ways in which we run a hive query.

Hope the commands and screenshots help.

Please visit my website www.milindjagre.com.

You can check out my LinkedIn profile here. Please like my Facebook page here. Follow me on Twitter here and subscribe to my YouTube channel here for the video tutorials.

Stay tuned.

Cheers!

 

 

 

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s