Post 31 | HDPCD | Defining a Partitioned Hive table

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series.

In the last tutorial, we saw how to create the hive external table. In this tutorial, we are going to see how to create a partitioned Hive table.

For doing this, we are going to follow the following process.

Apache Hive: Creating Partitioned Table
Apache Hive: Creating Partitioned Table

As you can see from the above infographics, it follows the same process as we have seen already in the previous tutorials.

Let us start off with the partitioned table creation in Apache Hive.

  • CHECK WHETHER PARTITIONED HIVE TABLE ALREADY EXISTS

We do check the pre-existence of the partitioned Hive table with the help of “show tables;” command.

show tables;

The following screenshot shows the output of the above command.

Step 1: checking for existence of hive table
Step 1: checking for existence of hive table

As you can see from the above screenshot, the table post31 does not exist, therefore we are free to go ahead and create the partitioned Hive table post31.

  • TABLE DEFINITION FOR THE PARTITIONED HIVE TABLE

Now, it is very important to decide on the table definition, especially in the case of Partitioned Hive Table. In this case, we need to think about the columns on which the table is going to be partitioned and accordingly place those column names in the PARTITIONED BY clause of the Hive CREATE command syntax.

As you have seen in the previous tutorial, we are using the student information to create the Hive tables. In this case, as well, we are going to create Hive tables by considering that we are going to store the student information.

Let me give you an idea about the input data file.

I have uploaded this input file to my GitHub Profile under HDPCD repository with name “42_input_partition_hive_table.csv“. You can download this file by clicking here and it looks something like this.

As you can see from the above file snippet, the input file contains a total of 5 columns. These columns are as follows.

Column 1: ID (Integer)

Column 2: Name (String)

Column 3: Gender (String)

Column 4: Year (Integer)

Column 5: Month (Integer)

Now, our aim is to create a Partitioned Hive Table in such a way that the first three columns i.e. Column 1, 2, and 3 should be the main columns of the table whereas the rest i.e. Columns 4 and 5 should be the partitioned columns of the Hive table.

You might ask, what is the need of the Partitioned Columns or Tables. In the simplest words, we can answer this question by saying that if a column is Partitioned, then it takes less time to execute a WHERE CLAUSE as compared to the non-Partitioned columns.

If this is clear, then let us design the CREATE TABLE command for the Partitioned Hive Table post31.

I have uploaded this CREATE COMMAND to my GitHub Profile under HDPCD repository with name “43_hive_partitioned_table.sql“. You can download this file by clicking here and it looks something like this.

If you have a close look at the above CREATE command, you will see that the NON-PARTITIONED COLUMNS i.e. ID, Name, and Gender are in the main body of the CREATE command, whereas the PARTITIONED COLUMNS i.e. YEAR and MONTH are in the PARTITIONED BY clause of the CREATE command.

This tells Hive which columns to treat PARTITIONED COLUMNS and which to not.

Rest of the commands are self-explanatory similar to the last tutorial.

Now, let us create this post31 table in Hive’s “default” database.

  • CREATING HIVE PARTITIONED TABLE – POST31

We are going to use the same command as mentioned above to create the Hive Partitioned Table.

The following screenshot shows the process of the Partitioned Hive Table creation.

Step 2: creating hive partitioned table
Step 2: creating hive partitioned table

As you can see from the above screenshot, the Partitioned Hive Table post31 was successfully created and we got the “OK” message at the end of the command execution.

Once, the table is created, it is time confirm for the same.

  • CONFIRMING THE EXISTENCE OF THE PARTITIONED HIVE TABLE – POST31

We are going to use the same “show tables;” command to confirm the existence of the Partitioned Hive Table post31.

The command looks like as follows.

show tables;

The execution of the above command gives us the following output.

Step 3: checking for existence of hive table
Step 3: checking for existence of hive table

As you can see from the above screenshot, the Partitioned Hive Table post31 was successfully created in Hive’s “default” database.

This confirms that the table post31 was created.

Now, it is time to check whether the schema matches correctly or not.

  • CHECKING THE SCHEMA OF THE PARTITIONED HIVE TABLE – POST31

As we have seen in the last tutorial, we are going to use “desc” command to check the schema of the Partitioned Hive Table post31.

The command looks like as follows.

desc post31;

And the output of the above command looks as follows.

Step 4: checking the schema of the hive partitioned table
Step 4: checking the schema of the hive partitioned table

As you can see from the above screenshot, the schema of the created Hive Partitioned Table post31 matches with the defined schema.
Please note that the information regarding the PARTITIONED COLUMNS is given out separately in the output of the desc command. This confirms that Hive was able to identify the difference between the PARTITIONED COLUMNS and NON-PARTITIONED COLUMNS.

This completes the process of creating the Partitioned Hive Table.

Hope you people like the content. Please share it with your network.

This is going to be the LAST FORTNIGHT of my STUDENT LIFE as I am graduating in a couple of weeks.  I wish to spread my learnings to a larger audience and more frequently. Therefore, I will take few decisions once I get a full-time job offer from a good employer.
Hope that day is not much away when I will be able to associate myself with a good company. Fingers crossed. 🙂

You can check out my website at www.milindjagre.com

You can check out my LinkedIn profile here. Please like my Facebook page here. Follow me on Twitter here and subscribe to my YouTube channel here for the video tutorials.

 

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s