Hello, everyone. Welcome to the second post in the Data Analysis section of the HDPCD certification series.

In the last tutorial, we saw the three ways in which we run the hive commands. In this tutorial, we are going to create the hive-managed table i.e. hive internal table.

For creating a hive-managed or internal table, we are going to follow the process mentioned in the following infographics.

Creating hive-managed Table
Creating hive-managed Table

As you can see from the above picture, we are going to follow the four-step process for creating a hive-managed table. These four processes are as follows.

  • Check whether the table already exists
  • Create the table with defined schema
  • Check whether table got created or not
  • Check the schema of the table for verification

Let us start with each step one-by-one.

  • CHECKING WHETHER THE TABLE ALREADY EXISTS

We should first check whether the table that we are going to create already exists or not. If it exists, then we will get an error saying so. If it doesn’t, then its good to move forward.

After logging into hive session with “hive” command, you can run “show tables;” command to check the list of the hive tables that already exists. Following is the command.

show tables;

The execution of the above command looks something like this for my Hortonworks Sandbox.

Step 1: Check Hive table already exists or not
Step 1: Check Hive table already exists or not

As you can see from the above picture, there is a total of 30 tables already existing in hive’s “default” database. Since we are trying to create a table called post29, you can see that the table with that name does not exist, therefore we can move ahead to the next step of creating the hive-managed table post29.

  • CREATING HIVE-MANAGED/INTERNAL TABLE

You can use the create table command in Hive to create the hive-managed table. The funny thing about this create command is that it resembles with SQL’s create table statement.

We are going to use the following schema for creating this hive internal table.

I have uploaded this schema on my GitHub Profile under HDPCD repository with name “40_hive_managed_table.sql“. You can click here to download it and it looks as follows.

Now, let me explain this command briefly.

create table post29(
id int,
name string,
gender string
)

The above statement defines the name of the table and the column names along with their respective data types. Here, the statement “create table” indicates that it is a hive-managed or internal table. There is a list of data types supported by hive which you can check at the hive documentation page.

The next statement is

row format delimited

This statement defined the formatting of each record. Here, we are telling Hive that each record corresponds to a ROW and therefore, we have mentioned it to be ROW format delimited.

fields terminated by ‘,’

Through this command, we are mentioning that the records that we are going to store in this table have columns separated by a comma (,).

stored as textfile;

This command tells Hive that the loaded data should be stored in the warehouse directory in the TEXTFILE format. I am going to explain this bit a little more in the future posts when I will talk about the process of loading the data in the hive table.

If the explanation looks fine to you, it is time now to execute this “create table” command. The screenshot depicting the command execution is as follows.

Step 2: Creating Hive Managed Table
Step 2: Creating Hive Managed Table

As you can see from the above screenshot, the hive internal table was created successfully and we got an “OK” message after the command execution.

Let us move on to the next step.

  • CHECKING WHETHER HIVE TABLE SUCCESSFULLY CREATED OR NOT

You can run the same “show tables;” command to check whether the hive internal table was successfully created or not. The execution screenshot of this command looks as follows.

Step 3: Check Hive Table successfully created or not
Step 3: Check Hive Table successfully created or not

As can be seen from the screenshot, the post29 table was successfully created. Now, it is time to check the schema of the table post29.

  • CHECKING SCHEMA OF THE NEWLY CREATED HIVE INTERNAL TABLE

The schema of the hive internal table can be checked with the help of either “desc” or “describe” command. The command for this case looks as follows.

desc post29;

describe post29;

You can use any of these commands to check the schema of the hive table post29.

The execution of the above commands looks as follows.

Step 4: Checking newly created hive table's schema
Step 4: Checking newly created hive table’s schema

You can see from the above screenshot that the created schema matches with the defined schema while creating the hive internal table.

This completes the hive-managed/internal table creation.

Hope the screenshots and the commands help you in understanding the concepts.

Please visit my website www.milindjagre.com for further information.

You can check out my LinkedIn profile here. Please like my Facebook page here. Follow me on Twitter here and subscribe to my YouTube channel here for the video tutorials.

 

Advertisements

One thought on “Post 29 | HDPCD | Define a Hive-managed Table

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s