Post 38 | HDPCD | Load data into Hive table from a Local Directory

Hello, everyone. Thanks for returning for the next tutorial in the HDPCD certification series. In the last tutorial, we saw how to specify the delimiter of the Hive table. In this tutorial, we are going to see how to load the data from the local Directory into the Hive table.

Let us begin then.

Apache Hive: Loading data from Local file
Apache Hive: Loading data from Local file

The above infographics show the step by step process to perform this activity.

We will perform these tasks as shown below.

  • CREATING INPUT CSV FILE IN THE LOCAL FILE SYSTEM

We use the vi editor for creating the input CSV file in the local file system.

I have uploaded this input CSV file to my GitHub profile under HDPCD repository with name “49_input_to_load_from_local.csv“. You can download this file by clicking here and it looks as follows.

The following commands are used for creating this input CSV file.

vi post38.csv

################################

PASTE THE COPIED CONTENTS HERE

################################

cat post38.csv

The output of the above commands looks as follows.

Step 1: Creating input CSV file in local file system
Step 1: Creating input CSV file in local file system

The above screenshot shows that the file post38.csv was created successfully and it is present in the local file system.

Let us check whether table post38 exists or not.

  • CHECKING THE PRE-EXISTENCE OF THE HIVE TABLE POST38

We use the “show tables;” command to get the list of tables present in the “default” databse of hive.

show tables;

The output of the above command is as follows.

Step 2: Checking the pre-existence of the Hive table
Step 2: Checking the pre-existence of the Hive table

As you can see from the above screenshot, the list of tables does not contain a table with name post38. This indicates that we can go ahead and create the hive table with name post38.

  • CREATING HIVE TABLE WITH NAME POST38

We use the “CREATE” command to create the hive table to load the data from the local file system.

I have uploaded this input CSV file to my GitHub profile under HDPCD repository with name “50_create_hive_table_for_local_load.sql“. You can download this file by clicking here and it looks as follows.

The execution of the above command looks as follows.

Step 3: Creating Hive table for loading local data
Step 3: Creating Hive table for loading local data

The above screenshot shows that we get “OK” message after executing this CREATE statement. This indicates that the table post38 was created successfully.

Let us confirm the existence of the hive table post38.

  • CONFIRMING THE EXISTENCE OF THE HIVE TABLE POST38

We use the same command “show tables;” to check the existence of the hive table.

show tables;

The output of the above command is as follows.

Step 4: Confirming the existence of the Hive table
Step 4: Confirming the existence of the Hive table

As you can see, the hive table post38 was created successfully. This allows us to continue with the further checks.

Let us check the schema of the hive table post38.

  • CHECKING THE SCHEMA OF THE HIVE TABLE POST38

We use the “desc” command to check the detailed schema of the hive table post38.

desc formatted post38;

The output of the above command is as follows.

Step 5: Checking the schema of the Hive table
Step 5: Checking the schema of the Hive table

As you can see from the above screenshot, the table post38 contains three columns with defined name and data types. The input format of the hive table post38 is also as expected and that is the TextInputFormat.

This confirms that the hive table post38 got created successfully with correct schema.

It is time to check how many records exists in the table post38 before actually loading the data.

  • RECORDS BEFORE DATA LOADING FROM LOCAL FILE SYSTEM

We use the “SELECT” command to get the records stored in the hive table.

select * from post38;

The output of the above command is as follows.

Step 6: Checking the records before the local data load
Step 6: Checking the records before the local data load

As you can see, we got zero records after executing the above command. This indicates that, by default, there are no records in the hive table post38.

From this, we can conclude that after loading the data from post38.csv file, the hive table post38 should have 9 records with 3 columns.

Let us load the data into the hive table post38 from the local file system.

  • LOADING DATA INTO HIVE TABLE FROM LOCAL FILE SYSTEM

We use the “LOAD” command to load the data from the local file system into the hive table. This command is as follows.

load data local inpath ‘post38.csv’ into table post38;

The output of the above command is as follows.

Step 7: Loading the local data into the Hive table
Step 7: Loading the local data into the Hive table

The above screenshot shows that the input local file post38.csv was successfully loaded into the hive table post38. The screenshot shows the stats like the number of files and the size of the input file(s).

This enables us to look for the records stored in the hive table post38.

  • CHECKING THE RECORDS IN HIVE TABLE POST38

We use the same command “SELECT” command to check the records that got stored in the hive table.

select * from post38;

The output of the above command is as follows.

Step 8: Checking the records after the local data load
Step 8: Checking the records after the local data load

The above screenshot shows the that there are a total of 9 records in the hive table post38 with 3 columns. This completes our expectations and matches exactly with the input file post38.csv.

We can conclude this tutorial here.

I hope you guys like the contents. Keep sharing it with others.

In the next tutorial, we are going to see how to load data into a Hive table from an HDFS Directory.

You can check out my LinkedIn profile here. Please like my Facebook page here. Follow me on Twitter here and subscribe to my YouTube channel here for the video tutorials.

 

 

 

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s