Hello, everyone! Welcome to the third tutorial in the Data Analysis section of the HDPCD certification. In the last tutorial, we saw how to create the hive-managed or internal table. In this tutorial, we are going to create the hive external table. So, let us start with the process.

The following infographics show the process of creating an external hive table.

Hive External Table Creation
Hive External Table Creation

It is a 4-step process, which includes the following steps.

  • CHECKING EXISTENCE OF HIVE EXTERNAL TABLE
  • CHECKING HDFS LOCATION FOR THE EXISTENCE OF THE DATA
  • CREATING HIVE EXTERNAL TABLE
  • VERIFYING AND CHECKING THE SCHEMA OF THE HIVE EXTERNAL TABLE

So, let us start off with each step one-at-a-time.

  • CHECKING THE EXISTENCE OF THE HIVE EXTERNAL TABLE

As shown in the previous tutorial, we use the “show tables;” command to check the existence of the hive external table. The command looks as follows.

show tables;

The output of the above command looks as follows in my case.

Step 1: Checking for existence of hive table
Step 1: Checking for existence of hive table

As you can see from the above screenshot, the table post30 does not exist in the hive’s
“default” database. It ensures that we can go ahead and create post30 hive external table.

Let us continue with the next step now.

  • CHECKING HDFS LOCATION FOR THE EXISTENCE OF THE DATA

Since I did not have existing data in HDFS, I did create a sample of 10 records to load into post30 hive external table.

I have uploaded this input file to my GitHub Profile under HDPCD repository with name “41_input_hive_external_table.csv“. You can download this file by clicking here and it looks something like this.

We are going to use the following commands to create the input file post30.csv in the local file system.

vi post30.csv

################################

PASTE THE COPIED CONTENTS HERE

################################

cat post30.csv

Once the file post30.csv gets created in the local file system, we need to push this file into HDFS. We are going to use the following commands to achieve this.

hadoop fs -mkdir /hdpcd/input/post30
hadoop fs -put post30.csv /hdpcd/input/post30
hadoop fs -cat /hdpcd/input/post30/post30.csv

The output of the above commands looks like this.

Step 2: Creating input data in HDFS
Step 2: Creating input data in HDFS

As you can see from the above screenshot, the file was successfully created in HDFS at location /hdpcd/input/post30.

Now, let us start creating the hive external table.

  • CREATING HIVE EXTERNAL TABLE

As you can see from the above screenshot, the input file contains the three columns. The first one contains ID in INT format, the second one is the name in STRING format and the last one is a GENDER indicator in STRING format.

We are going to use the following CREATE command to create the post30 hive external table.

create external table post30 (
id int,
name string,
gender string
)
row format delimited
fields terminated by ‘,’
location ‘/hdpcd/input/post30’;

The explanation of the above CREATE command is similar to the last post with one difference in the last command.

While creating an external table, we need to pass the location parameter which expects the HDFS location in this case.

The following screenshot shows the execution of the above CREATE command.

Step 3: creating hive external table for input data
Step 3: creating hive external table for input data

If you got the commands, it is time to check whether the hive external table got created or not.

  • CHECKING THE SCHEMA OF HIVE EXTERNAL TABLE – POST30

We can use the “show tables;” to check the existence of the external hive table.

show tables;

Once confirmed, you can run the following command to check the schema of this hive external table.

desc post30;

The output of the above command looks as follows.

Step 4: checking the schema of the hive external table
Step 4: checking the schema of the hive external table

We confirm from the above screenshot that the post30 hive external table was created with the defined schema.

This completes the process of creation of the Hive external table.

I hope the tutorials are helping you to understand the concepts related to the HDPCD certification.

In the next tutorial, we are going to see how to create the PARTITIONED HIVE TABLE.

Please check out my website at www.milindjagre.com

You can check out my LinkedIn profile here. Please like my Facebook page here. Follow me on Twitter here and subscribe to my YouTube channel here for the video tutorials.

 

Advertisements

2 thoughts on “Post 30 | HDPCD | Define a Hive External Table

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s