Post 36 | HDPCD | Specifying storage format of a hive table

Hello, everyone. Thanks for coming back to one more tutorial in this certification series.

In the last tutorial, we saw how to insert records into an ORC table from an NON-ORC table. In this tutorial, we are going to see how to specify the storage format of a Hive table.

Let us begin, then.

Traditionally, we are going to start off by checking whether the table POST36 already exists or not.

  • CHECKING THE PRE-EXISTENCE OF THE HIVE TABLE POST36

We use the “show tables;” command to see the list of hive tables in hive’s “default” database.

show tables;

The output of the above command looks like this.

Step 1: Checking the pre-existenece of hive table with defined storage format
Step 1: Checking the pre-existence of hive table with defined storage format

As you can see from the above screenshot, the table POST36 does not exist in Hive. This indicates that we can go ahead and create the Hive table POST36.

Now, let us create the Hive table with the correct schema.

  • CREATING HIVE TABLE WITH SEQUENCEFILE STORAGE FORMAT 

We are using the SequenceFileInputFormat as the input format for the Hive table POST36. You can use different storage formats as well.

Hive supports the following storage formats.

  1. textfile
  2. sequencefile
  3. orc
  4. parquet
  5. avro
  6. rcfile

You can any of the above six storage formats. I am using the sequencefile to use the SequenceFileInputFormat.

For creating this table, we are using the following “CREATE” command.

I have uploaded this SQL file to my GitHub profile under HDPCD repository with name “46_sequence_file_hive.sql“. You can download this file by clicking here.

The following screenshot shows the execution of the above command.

Step 2: Creating hive table with defined storage type
Step 2: Creating hive table with defined storage type

“OK” message shown in the above screenshot shows that the table POST36 was created successfully.

Now, we will confirm the existence of the Hive table POST36.

  • CONFIRMING THE EXISTENCE OF HIVE TABLE POST36

We use the same “show tables;” command to check whether POST36 was created successfully or not.

show tables;

The execution of the above command is as follows.

Step 3: Confirming the existence of the newly created hive table
Step 3: Confirming the existence of the newly created hive table

As can be seen from the above screenshot, the table POST36 was created successfully.

Now, the last thing we want to check is the schema of this table.

  • CONFIRMING THE SCHEMA OF HIVE TABLE POST36

We use the “desc” command to check the schema of the Hive table POST36.

desc formatted post36;

The output of the above command is as follows.

Step 4: Checking the schema of newly created hive table
Step 4: Checking the schema of newly created hive table

As you can see from the above screenshot, the column names and datatypes are as expected. The storage information shows that the InputFormat of the Hive table is SequenceFileInputFormat, as expected.

This completes our tutorial of creating a Hive table with defined storage format.

I hope you guys like the content.

In the next tutorial, we are going to see how to define the delimiter of a Hive table.

You can check out my LinkedIn profile here. Please like my Facebook page here. Follow me on Twitter here and subscribe to my YouTube channel here for the video tutorials.

 

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s