Post 40 | HDPCD | Load data into hive table as a result of a query

1

August 7, 2017 by milindjagre

Hi, everyone. Thanks for coming back for one more tutorial in this HDPCD certification series.

In the last tutorial, we saw how to load data into a Hive table from an HDFS directory. In this tutorial, we are going to see how to load the data into a Hive table as a result of a Hive query.

Let us begin then.

Apache Hive: Loading data from a query

Apache Hive: Loading data from a query

The above infographics show the step by step process to perform this activity.

We will perform these tasks as shown below.

  • CHECKING THE PRE-EXISTENCE OF THE HIVE TABLE POST40

We use the “show tables;” command to get the list of tables present in the “default” database of the hive.

show tables;

The output of the above command is as follows.

Step 1: Checking the pre-existence of the hive table

Step 1: Checking the pre-existence of the Hive table

As you can see from the above screenshot, the list of tables does not contain a table with name post40. This indicates that we can go ahead and create the hive table with name post40.

  • SELECT QUERY TO POPULATE THE HIVE TABLE

We need to carefully choose the “SELECT” query which we are going to use for loading the data into Hive table post40.

As we have seen from the above screenshot, the Hive table post39 seems to be existing already, and as seen in the last tutorial, it contains a sample of 9 records with 3 columns. To refresh your memory, we will use the following command to load the data into Hive table post40.

select * from post39;

The output of the above command looks as follows.

Step 2: Checking the SELECT query to populate the new hive table

Step 2: Checking the SELECT query to populate the new hive table

Now that we are good with the SELECT query, let us create the Hive table post40 which follows the schema of the SELECT query.

  • CREATING HIVE TABLE WITH NAME POST40

We use the “CREATE” command to create the Hive table to load the data from the local file system.

I have uploaded this input CSV file to my GitHub profile under HDPCD repository with name “53_create_hive_table_for_SELECT_load.sql“. You can download this file by clicking here and it looks as follows.

The execution of the above command looks as follows.

Step 3: Creating the new hive table post40 with same schema of post39

Step 3: Creating the new Hive table post40 with the same schema of post39

The above screenshot shows that we get “OK” message after executing this CREATE statement. This indicates that the table post40 was created successfully.

Let us confirm the existence of the Hive table post40.

  • CONFIRMING THE EXISTENCE OF THE HIVE TABLE POST40

We use the same command “show tables;” to check the existence of the Hive table.

show tables;

The output of the above command is as follows.

Step 4: Confirming the existence of the newly created hive table

Step 4: Confirming the existence of the newly created hive table

As you can see, the hive table post40 was created successfully. This allows us to continue with the further process.

Let us check the records in the Hive table post40 before doing the actual data load.

  • CHECKING THE SCHEMA OF THE HIVE TABLE POST40

We use the “SELECT” command to check the records of the Hive table post40 before loading the data.

select * from post40;

The output of the above command is as follows.

Step 5: Checking the records in hive table post40 before data load

Step 5: Checking the records in hive table post40 before data load

As you can see, we got zero records after executing the above command. This indicates that, by default, there are no records in the Hive table post40.

From this, we can conclude that after loading the data from the above-mentioned SELECT query, the Hive table post40 should have 9 records with 3 columns.

Let us load the data into the hive table post40 from the SELECT query.

  • LOADING DATA INTO HIVE TABLE FROM SELECT QUERY

We use the “INSERT” command to load the data from the SELECT query into the Hive table. The Hive command is as follows.

insert into post40 select * from post39;

The output of the above command is as follows.

Step 6: Loding the data using LOAD command

Step 6: Loading the data using LOAD command

The above screenshot shows that the output of the SELECT query was successfully loaded into the Hive table post40. The screenshot shows the stats like the number of files and the size of the input file(s).

This enables us to look for the records stored in the Hive table post40.

  • CHECKING THE RECORDS IN HIVE TABLE POST40

We use the same command “SELECT” command to check the records that got stored in the Hive table.

select * from post40;

The output of the above command is as follows.

Step 7: Confirming the records stored in hive table post40

Step 7: Confirming the records stored in hive table post40

The above screenshot shows the that there is a total of 9 records in the hive table post40 with 3 columns. This completes our expectations and matches exactly with the SELECT query output.

We can conclude this tutorial here.

I hope you guys like the contents. Keep sharing it with others.

In the next tutorial, we are going to see how to load compressed data into a Hive table.

You can check out my LinkedIn profile here. Please like my Facebook page here. Follow me on Twitter here and subscribe to my YouTube channel here for the video tutorials.

 

Advertisements

One thought on “Post 40 | HDPCD | Load data into hive table as a result of a query

  1. […] everyone. Thanks for returning for the next tutorial in the HDPCD certification series. In the last tutorial, we saw how to load data into a Hive table from a SELECT query. In this tutorial, we are […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: