Array<Datatype> is considered to be the complex datatype in case of HIVE which is used to store the data into internal/external tables using the data present in the local files.
It is customary to specify the collections’ behavior while creating the table itself. We need to pass the argument which specifies which character is used for representing the separation of two entries of the array element.
The following steps are quite easy and self-explanatory.
1) CREATION OF HIVE TABLE
Hive table should be created keeping in mind that one of the columns is having Array<String> as datatype, so we need to mention the same. We should specify that the records are ROW FORMAT DELIMITED and the fields are terminated by which character(i.e. space in this case) and after this we should specify the character which is used for separating the collection items and at the last, we should store the contents as the text file.
The following code will explain all the things in easy way. Please have a look.
|CREATE TABLE table1(column1 STRING, column2 ARRAY<STRING>, column3 INT)ROW FORMAT DELIMITEDFIELDS TERMINATED BY ”
COLLECTION ITEMS TERMINATED BY ‘,’
STORED AS TEXTFILE;
2) LOADING DATA INTO TABLE
The data is loaded into the newly created table1 table in similar fashion. The only thing that we need to keep in mind that the second column bears the array datatype with string values, so we should specify the same in our sample.txt file which we are going to use while inserting data into the hive table.
The contents of sample.txt file should look like this
|milindit,cse,extc 20milind1 it,it,it 22|
If you take a close look, you will see that milind and it and extc and 20 in first line are separated by space. Same thing goes for the second line as well.
And the array contents i.e. it,cse,extc (in first line) and it,it,it (in second line) are separated by comma as we have already specified in table definition.
Now this sample.txt file should be moved to the linux machine on which you are running hive using an FTP client like WinSCP. Once you transfer this into your local directory, you are good to go.
You can run the following line of code and the data will get inserted into the hive table table1.
|load data local inpath ‘sample.txt’ into table table1;|
Once you run the above command, all the data present in sample.txt will get inserted into table1.
If something goes wrong, the entry will be null for that particular column. So you don’t need to panic, just check the table definition and compare it with data in the sample.txt file and the way in which the two columns are separated. The screenshots shared below will be quite helpful.
Please have a look.