Thanks for coming back for the next tutorial in the HDPCD certification series. In the last tutorial, we saw how to remove the records with the NULL values, whereas in this tutorial, we are going to see the process of storing the output of a Pig Relation in the HDFS directory.
This is one of the simplest tasks to do in this certification and therefore, I have kept this tutorial as simple as possible. We are going to store the input file, as is, without any further operation in HDFS directory.
We are going to follow the following step by step process.
If you take a close look at the above picture, you will see that we are not doing anything to the input file and directly storing it into HDFS directory.
We are going to perform these tasks one by one as shown below.
- INPUT FILE CREATION IN LOCAL FILE SYSTEM
We can use the vi editor for creating this file. The following commands do help us to do the same.
PASTE THE CONTENTS HERE
The following screenshot might be helpful to get the idea about above command execution.
Once we have the file in local file system, it is time to push this file to HDFS.
- PUSH INPUT FILE TO HDFS
We can push this input file from the local file system to HDFS with the help of put command. The command is as follows.
hadoop fs -mkdir /hdpcd/input/post17
hadoop fs -put post17.txt /hdpcd/input/post17
hadoop fs -cat /hdpcd/input/post17/post17.txt
The execution of these commands is shown in the below screenshot.
As you can see that the file is loaded successfully in HDFS.
The next step is to create the pig script which will store this input file into HDFS directory.
- PIG SCRIPT CREATION
Let us create the pig script for executing the objective of this tutorial. I have uploaded this pig script to my GitHub profile under HDPCD repository and you can download this script by clicking here. The pig script looks as follows.
Let us see each command in the above pig script.
input_data = LOAD ‘/hdpcd/input/post17/post17.txt’ USING PigStorage() AS (line:chararray);
Above command is used for storing the input file in a pig relation called input_data. Each line is represented by the variable called line, having the datatype chararray.
STORE input_data INTO ‘/hdpcd/output/post17’;
Above STORE command is the one which we are studying in this tutorial. STORE command is used for storing the data stored in the pig relation into HDFS directory. You can specify the delimiter with the help of USING PigStorage() option which I haven’t used in the above command.
Hope the explanation makes sense.
It is time to run the above script.
- RUN PIG SCRIPT
We must use the HDFS mode to run this script, as we have to store the output in HDFS directory. We can use either the default MapReduce mode or the TEZ mode. To make this script run faster, we are going to use the TEZ mode and are going to run the following command.
pig -x tez post17.pig
Let us take a look at the execution of the above command.
Following screenshot shows the output of the above command which we ran.
As can be seen from the above screenshot, the pig script was run successfully and it was able to store the output under HDFS directory /hdpcd/output/post17. Let us observe this output HDFS directory.
- OUTPUT OBSERVATION
Let us take a look at the output directory.
We will use the following commands to check the output directory.
hadoop fs -ls /hdpcd/output/post17
hadoop fs -cat /hdpcd/output/post17/part-m-00000
The following screenshot gives us an idea about the output.
As you can see from the above screenshot, the output file part-m-00000 gives us the exact contents of the input file. This concludes the tutorial right here.
Hope the text and the screenshots make sense. Kindly comment and share it with your friends and network. In the next tutorial, we are going to see how to stroe the pig relation in the hive table.
Please follow my blog for the further updates. Kindly click here to like my facebook page. You can follow me on twitter here. You can subscribe to my YouTube channel by clicking here to get updates regarding the video tutorials. You can check out my LinkedIn profile here.