Post 33 | HDPCD | Define a Table from a SELECT Query

Hello everyone and welcome to one more tutorial in the HDPCD certification series.

In the last tutorial, we saw how to define a BUCKETED hive table. In this tutorial, we are going to see how to create a Hive table from a SELECT query.

Let us begin then.

We are going to follow the below process for creating a brand new table from a SELECT query. It is also called as CTAS which stands for Create Table As Select.

Apache Hive: CTAS
Apache Hive: CTAS

As you can see from the above infographics, it is a 5-step process. We will see each step one by one. So, here it goes.

  • CHECKING IF THE NEW TABLE NAME i.e. POST33 ALREADY EXISTS

You might have gotten this step as of now since we are following it in each tutorial. We use “show tables;” command to get the list of tables and check whether POST33 table already exists.

show tables;

The output of the above command looks something like this.

Step 1: Checking the pre-existence of the hive table
Step 1: Checking the pre-existence of the hive table

As you can see from the above screenshot, the table POST33 does not exist already in the “default” database in hive.

This means, we can go ahead and construct the hive table with name “POST33“.

  • CONSTRUCTING THE SELECT QUERY FOR CREATING TABLE POST33

This is the second step in this process. In this step, we have to construct the SELECT query which we are going to use for creating a table POST33. Now, this SELECT query can be as complicated as it can be, but for demonstration purpose, I am using the simplest SELECT query possible. This SELECT query extract the entire data from the already existing table POST30 which we created in this tutorial. This SELECT query looks like this.

select * from post30;

The output of the above command is as follows.

Step 2: Checking the select query used for population
Step 2: Checking the select query used for population

As you can see from the above screenshot, the SELECT query returned 9 records having 3 columns. Therefore, after creation, our output table POST33 should also contain 9 records with 3 columns and the column names of table POST33 should match with that of POST30.

Now that we have constructed the SELECT query, it is time to build the new table POST33 with the help of CTAS query.

  • USING CTAS TO BUILD NEW HIVE TABLE POST33

We are going to use the following query to create the hive table POST33.

create table post33 as select * from post30;

The output of this command looks like this.

Step 3: Creating table from a SELECT query
Step 3: Creating table from a SELECT query

As it can be seen from the screenshot, Hive server launches the MapReduce job in the back-end to create table POST33 and load the resultant data from the SELECT query into table POST33.

The command returns “OK” message at the end and along with STATUS shown as “SUCCEEDED“. These two things indicate that our command worked fine and the expected output was created.

But, we should better check it before commenting anything on it.

  • CONFIRMING THE EXISTENCE OF THE NEW TABLE POST33

We can use the same “show tables;” command to check if the hive table POST33 was created successfully or not.

show tables;

The output of the above command looks as follows.

Step 4: Confirming the existence of newly created hive table
Step 4: Confirming the existence of newly created hive table

As you can see from the above screenshot, the hive table POST33 was successfully created in the Hive “default” database.

Now, let us check the schema of this newly created hive table POST33.

  • CHECKING THE SCHEMA OF NEWLY CREATED HIVE TABLE POST33

We can use the formatted description of the hive table POST33 to see the column names and their respective data types. This formatted description of the hive table POST33 can be printed with the help of the following hive command.

desc formatted post33;

The output of the above command looks something like this.

Step 5: Checking schema and other properties of newly created hive table
Step 5: Checking schema and other properties of newly created hive table

The above screenshot suggests the following things

  • Table POST33 contains 3 columns
  • The name of these columns match exactly with input hive table POST30
  • The type of the POST33 table is the internal/hive-managed table as it is visible from the Table Type parameter
  • The input format of this POST33 table is TextInputFormat (Default Input format)
  • The table POST33 is NOT PARTITIONED
  • The table POST33 is NOT BUCKETED

This information is very important when it comes to get the sense of the table and column data types.

If this information is clear, it is time to check the records that got stored in the table POST33.

  • CHECKING THE OUTPUT RECORDS IN THE TABLE POST33

We are going to use the simple SELECT command to get the total data residing in the output table POST33.

The command looks as follows.

select * from post33;

The above hive command generates the following output.

Step 6: Checking the records in newly created hive table
Step 6: Checking the records in newly created hive table

As can be seen from the above screenshot, the table POST33 contains a total of 9 records with 3 columns, as expected and already mentioned at the start of this tutorial.

This completes the objective of this tutorial.

In the next tutorial, we are going to see how to define a hive table that stores the data in the ORC File Format, instead of the TextInputFormat.

Hope you guys like the contents.

Please check out my website at www.milindjagre.com.

You can check out my LinkedIn profile here. Please like my Facebook page here. Follow me on Twitter here and subscribe to my YouTube channel here for the video tutorials.

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s