Hello, everyone. Thanks for going through the tutorials. The increasing views and visitors act as a motivation for me.

In the last tutorial, we saw how to define a hive ORC table. In this tutorial, we are going to load data in that ORC table from an NON-ORC table.

For doing this, we are going to follow the below process.

Loading records into ORC table from NON-ORC table
Loading records into ORC table from NON-ORC table

As you can see from the above infographics, this process follows the below steps.

  • CHECK NON-EXISTENCE OF ORC TABLE and EXISTENCE OF NON-ORC TABLE

We use “show tables;” command to check the list of tables present in the hive database.

show tables;

The output of the above command is as follows.

Step 1: Checking the pre-existenece of hive table with ORC format
Step 1: Checking the pre-existence of hive table with ORC format

As you can see from the above screenshot, the table post33 exists which is an NON-ORC table. Also, the target ORC table post35 does not exist, which indicates we can go ahead and create table post35.

But, before doing that, let us check the records present in the table post33.

  • CHECKING THE RECORDS STORED IN NON-ORC TABLE POST33

We use “SELECT” command to see the records stored in the post33 table.

select * from post33;

The output of the above command looks as follows.

Step 2: Checking the input records from NON-ORC Hive table
Step 2: Checking the input records from NON-ORC Hive table

As you can see from the above screenshot, the table post33 contains a total of 9 records with 3 columns.

Therefore, our expectation is that at the end of this tutorial, we should get 9 records with 3 columns in ORC table POST35.

Let us continue with the further steps, then.

  • CREATING HIVE ORC TABLE POST35

We use the “CREATE” command shown in this tutorial to create the hive table with ORC as the storage format. For further explanation, you can refer this tutorial. The “CREATE” command looks like this.

create table post35 (
id int,
name string,
gender string
)
row format delimited
fields terminated by ‘,’
stored as orc;

The output of the above command looks as follows.

Step 3: Creating hive table with ORC format
Step 3: Creating hive table with ORC format

As you can see from the above screenshot, the “OK” message tells us that the table “POST35” was created successfully.

Now, it is time to do the comparison between schema of both tables – POST33 and POST35.

  • CHECKING THE SCHEMA OF TABLES POST33 AND POST35

We use the “desc” and “desc formatted” command to check the schema of tables POST33 and POST35.

desc post33;

desc formatted post35;

The output of the above commands looks like this.

Step 4: Checking schema of newly created hive table
Step 4: Checking schema of newly created hive table

As you can see, the number of columns and the column datatypes match exactly with each other.

The only difference between these two tables is their storage type. The post33 table follows the default storage format i.e. TextInputFormat, the newly created table POST35 follows the ORCInputFormat.

Once we confirm this, it is time to load some records in this ORC table.

  • INSERTING DATA INTO ORC TABLE FROM NON-ORC TABLE

We use the “INSERT” command to load records in the ORC table. The command is as follows.

insert into post35 select * from post33;

The output of the above command looks as follows.

Step 5: Inserting data into ORC table from NON-ORC table
Step 5: Inserting data into ORC table from NON-ORC table

As you can see from the above screenshot, hive triggers a MapReduce job at the backend to perform this activity. The “SUCCEEDED” message shows that the command worked fine and the intended operation was a successful operation.

Apart from the message, it gives various stats such as the number of files, number of records, the total size of the records, and the data size in number of bytes. These stats help to make an informed decision whether the operation executed successfully or not.

Now, the last step is to check the records in the hive table POST35.

  • CHECKING THE RECORDS STORED IN THE TABLE POST35

We use the “SELECT” commands to get the sense of data stored in the table POST35.

select * from post35;

The output of the above command is as follows.

Step 6: Checking the records in the ORC table
Step 6: Checking the records in the ORC table

As you can see from the above screenshot, the records inserted into table POST35 are similar to the table POST33. This confirms that the actual records got loaded into the ORC table from an NON-ORC table.

This completes the tutorial here.

In the next tutorial, we are going to see how to define the storage format of a hive table.

You can check out my LinkedIn profile here. Please like my Facebook page here. Follow me on Twitter here and subscribe to my YouTube channel here for the video tutorials.

 

Advertisements

One thought on “Post 35 | HDPCD | Insert records from NON-ORC table into ORC table

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s