Hello everyone, in this tutorial, we are going to see the 3rd objective in data ingestion category.

The objective is listed on Hortonworks website under data ingestion and looks like this.

Objective 3
Objective 3

In the previous post, we imported data into HDFS, here, we are going to import the data directly into Hive table.

So, let us begin.

The above command to import data into Hive is similar to the one we saw in the previous post, except the last two lines. We will see what last two lines mean.

Line 5: – -hive-import \

Explanation: This option indicates that the Sqoop Import operation is to import the data into hive table, not in HDFS directory. This is the line which enables data population in hive table from RDBMS data source.

Line 6: – -table customers

Explanation: – -table flag is followed by the hive table name. The data that is being imported should be put in this table name in hive. If the hive table does not exist, then it will be created automatically and if there is some already existing data in the hive table, then the newly imported data will be appended and not overwritten. You must make sure that the data which is imported follow the hive table column format correctly.

This hive import is executed by importing data in hive warehouse HDFS directory, so that hive table automatically gets loaded with imported data.

Now, let us see the MySQL table which is going to be imported.

Following screenshot might be helpful for this.

mysql-input-table
MySQL Table to import in Hive

As can be seen from above screenshot, the table name is customers whereas the database name is retail_db. The customers table has 9 columns in it with datatypes like int and varchar. I am pointing this out now itself so that once we import the data into the hive table, it will be easier for us to compare the two tables.

After confirming with MySQL, it is time to check hive database. We will see whether the table already exists or not. For that, we will log into hive database server and run show tables command as follows.

hive> show tables;

The screenshot below shows the output of above command.

hive-show-tables
Hive database before import

As you can see from above screenshot, currently, customers table does not exist in hive database.

Now, it is time to execute sqoop import command. The sqoop command is already shown above. Following screenshot shows the sqoop command execution.

sqoop-import-command
sqoop command execution

Above sqoop command gives us the following output on the terminal window.

sqoop-import-command-output
sqoop import command output – 1

As you can see, the screenshot indicates that there is a total of 12435 records which were imported into Hive Table.

Now, let us check the same by logging into hive database.

hive-show-tables-post-import
hive table list – post import

As you can see from above screenshot, the newly created/imported customers table got listed in show tables; command output. It indicates that the table was created successfully by the sqoop import command.

Now, let us check the structure of the table, whether it contains 9 columns or not, and their respective datatypes.

describe-table
describe customers table

As you can see, customers table contains 9 columns and their datatypes also match with the corresponding columns’ datatypes in MySQL table.

Now, the next step is to check for 10 sample records and see the import process executed smoothly without any glitches.

hive-select-star
hive table sample 10 records

It looks like the records were imported correctly.

Now, it is time to check the total number of records in hive table and see whether it is equal to 12435 or not.

hive-output-number-of-records
number of records in hive table

Wow, the total number of records in hive table exactly matches with that of MySQL table.

This indicates that our sqoop import command as a whole executed successfully without any issues/problems.

The last thing that I would like to show over here is that if the table already exists and it contains some records, then how sqoop import behaves.

Let us run the sqoop import command one more time and then check hive table. Following is the hive table status.

hive-output-number-of-records-2
Number of records in hive table – 2

As you can see, the total number of records in a hive table doubled after running the same import command one more time. It is evident from this that sqoop will append the newly imported records to hive table and in the process, the total number of records in hive table will increase.

I believe, this finally completes our sqoop import into hive table tutorial.

I hope that all the screenshots and proper description will help you out in figuring out how it works.

Suggestions and edits are welcome.

Stay tuned for the further updates.

Kindly subscribe to my YouTube channel for vlogs and follow my blog for further updates.

Cheers!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s