Post 37 | HDPCD | Specifying delimiter of a Hive table

Hello, everyone. Thanks for coming back for one more tutorial in this HDPCD certification series.

In the last tutorial, we saw how to specify the storage format of a Hive table. In this tutorial, we are going to see how to specify the delimiter of a Hive table.

We are going to follow the process mentioned in the following infographics.

Apache Hive: Specifying delimiter
Apache Hive: Specifying delimiter

This process is similar to the most of the Hive tutorials we have seen in this certification series. Let us start with this process.

  • CREATING INPUT FILE WITH TAB DELIMITER IN LOCAL FILE SYSTEM AND PUSHING IT INTO HDFS

We can create the input file in the local file system with the help of vi editor. We are going to use the following commands to create the input file in local file system.

I have uploaded this input TSV file to my GitHub profile under HDPCD repository with name “47_input_delimiter_hive.tsv“. You can download this file by clicking here. This file looks as follows.

We are going to use the following commands to create the input file in local file system.

vi post37.tsv

################################

PASTE THE COPIED CONTENTS HERE

################################

cat post37.tsv

The above commands create the input file post37.tsv in the local file system.

Once the file is created in the local file system, you can push this file to HDFS with the help of put command.

The commands look as follows.

hadoop fs -mkdir /hdpcd/input/post37
hadoop fs -put post37.tsv /hdpcd/input/post37
hadoop fs -cat /hdpcd/input/post37/post37.tsv

The output of the above commands is shown in the following screenshot.

Step 1: creating input TAB-delimited file
Step 1: creating input TAB-delimited file

As you can see from the above screenshot, the file post37.tsv was successfully created in HDFS. This file will be inserted into post37 Hive table as an input.

Now, let us check the existence of the Hive table post37.

  • CHECKING THE PRE-EXISTENCE OF THE HIVE TABLE

You can use the “show tables;” command to check the existence of the Hive table post37.

show tables;

The output of the above command looks like this.

Step 2: Checking pre-existence of hive table
Step 2: Checking pre-existence of hive table

As you can see from the above screenshot, the table post37 does not exist. This indicates that we can go ahead and create the Hive table post37.

Let us create the Hive table post37, now.

  • CREATING THE HIVE TABLE WITH TAB DELIMITER

We use the CREATE command for creating a Hive table with TAB delimiter for the records in post37.tsv.

I have uploaded the SQL file to my GitHub profile under HDPCD repository with name “48_hive_table_tab_delimiter.sql“. You can download this file by clicking here. This file looks as follows.

The output of the above command is shown in the following screenshot.

Step 3: Creating hive table with TAB-delimited records
Step 3: Creating hive table with TAB-delimited records

As it is shown in the above screenshot, the “OK” message shows that the Hive table post37 was created successfully. Having said that, let us confirm the existence of the Hive table post37.

You can use the following command to check the existence of the Hive table post37.

show tables;

The output of the above command is as follows.

Step 4: confirming the existence of newly created hive table
Step 4: confirming the existence of newly created hive table

The above screenshot confirms the existence of the Hive table post37. This enables us to go for the schema check of the Hive table post37.

  • CHECK THE SCHEMA OF THE NEWLY CREATED HIVE TABLE

You can use the “desc” command to check the schema of the Hive table post37.

desc formatted post37;

The output of the above command is shown in the following screenshot.

Step 5: confirming the schema and record delimiter of hive table
Step 5: confirming the schema and record delimiter of hive table

The above screenshot confirms the schema of the Hive table post37. At the bottom of the screenshot, as you can see, the FIELD DELIMITER is the TAB character (\t).

Now, the only thing remaining is the records check.

  • CHECK THE RECORDS STORED IN THE HIVE TABLE

You can use the “SELECT” command to check the records stored in the Hive table post37.

The command is as follows.

select * from post37;

The output of the above command is as follows.

Step 6: checking the records in the hive table
Step 6: checking the records in the Hive table

The above screenshot confirms that we were successful in loading the TAB delimited records into the Hive table post37.

This completes the tutorial here.

I hope you guys like the content.

In the next tutorial, we are going to see how to load data into a Hive table from a Local Directory.

You can check out my LinkedIn profile here. Please like my Facebook page here. Follow me on Twitter here and subscribe to my YouTube channel here for the video tutorials.

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s