Array<Datatype> is considered to be the complex datatype in case of HIVE which is used to store the data into internal/external tables using the data present in the local files.

It is customary to specify the collections’ behavior while creating the table itself. We need to pass the argument which specifies which character is used for representing the separation of two entries of the array element.

The following steps are quite easy and self-explanatory.

1)      CREATION OF HIVE TABLE

Hive table should be created keeping in mind that one of the columns is having Array<String> as datatype, so we need to mention the same. We should specify that the records are ROW FORMAT DELIMITED and the fields are terminated by which character(i.e. space in this case) and after this we should specify the character which is used for separating the collection items and at the last, we should store the contents as the text file.

The following code will explain all the things in easy way. Please have a look.

CREATE TABLE table1(column1 STRING, column2 ARRAY<STRING>, column3 INT)ROW FORMAT DELIMITEDFIELDS TERMINATED BY ”

COLLECTION ITEMS TERMINATED BY ‘,’

STORED AS TEXTFILE;

 

 

2)      LOADING DATA INTO TABLE

The data is loaded into the newly created table1 table in similar fashion. The only thing that we need to keep in mind that the second column bears the array datatype with string values, so we should specify the same in our sample.txt file which we are going to use while inserting data into the hive table.

The contents of sample.txt file should look like this

milindit,cse,extc 20milind1 it,it,it 22

 

If you take a close look, you will see that milind and it and extc and 20 in first line are separated by space. Same thing goes for the second line as well.

And the array contents i.e. it,cse,extc (in first line) and it,it,it (in second line) are separated by comma as we have already specified in table definition.

Now this sample.txt file should be moved to the linux machine on which you are running hive using an FTP client like WinSCP. Once you transfer this into your local directory, you are good to go.

You can run the following line of code and the data will get inserted into the hive table table1.

load data local inpath ‘sample.txt’ into table table1;

 

Once you run the above command, all the data present in sample.txt will get inserted into table1.

If something goes wrong, the entry will be null for that particular column. So you don’t need to panic, just check the table definition and compare it with data in the sample.txt file and the way in which the two columns are separated. The screenshots shared below will be quite helpful.

Please have a look.

 

 

hivetablecreation
HiveTableCreation

 

describe table
DecribeCreatedTable

 

cat sample.txt
SeeContentOfSample.txt

 

load data into table
load data into hive internal table

 

select command in hive
SeeContentsOfHiveTable

 

where clause just to cross check
cross check
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s