Post 11 | HDPCD | Load Pig Relation WITH schema

In the previous tutorial, we saw how to load the Pig Relation without a defined schema. In this tutorial, we are going to load a Pig Relation with a properly defined schema.

It is exactly similar to the last tutorial, except for one step, which I will discuss in a moment. Please have a look at the below infographics which depict the step by step process to construct a Pig Relation out of an input HDFS file with a defined schema.

Apache Pig Relation With Schema
Apache Pig Relation With Schema

As you can see from the above picture, the process is exactly the same.

So let us begin performing all the steps mentioned above.

Let us look at the input data that we have to load into the Pig Relation.

Input Content
Input Content

For your reference, I have uploaded this file to my GitHub profile at this location and it looks as follows.

After taking a look at the input data shown above, we can say following things

  • Number of Columns: 7
  • Column Datatype: String, int, int, int, int, int, int
  • Column Separator: comma (,)
  • Column Names: Our choice, we can give these columns any names that we want

Let us create this file in Local File System with the help of vi command.

Once I copy paste the contents in the terminal window and save the file, cat command gives me the following output.

cat command output
cat command output

Now that I have the file in my Local File System, it is time to put it in HDFS, as we are going to use the MapReduce mode to run the Pig script. Following are the commands which we use to load this input.csv file from Local File System to HDFS.

hadoop fs -mkdir -p /hdpcd/input/post11

hadoop fs -put input.csv /hdpcd/input/post11

hadoop fs -cat /hdpcd/input/post11/input.csv

Following is the screenshot confirming that above commands run successfully and we get the desired input.csv file in HDFS under /hdpcd/input/post11 directory.

Pushing Input File to HDFS
Pushing Input File to HDFS

According to the infographics shared above, we are done with step 3 with this.

Now, let us build our Pig Script. I have uploaded this Pig Script on my GitHub profile and you can download it here. It looks as follows.

If you remember, in the beginning of this tutorial, I mentioned that there is only one difference between this and the previous tutorial, and that difference is mentioned in the Line number 4 in above post11.pig file.

In above file, we are mentioning the schema i.e. name and datatype of each column, which should be applied for the newly created Pig Relation “data_with_schema”.

The command DESCRIBE and DUMP are executed to confirm that the schema was created and the data was pushed to the Pig Relation successfully.

I used the following command to run this Pig Script.

pig -f post11.pig

Following screenshot depicts the execution that gets started once we execute above command.

Running Pig Script
Running Pig Script

As you can see in the above screenshot, the data was loaded with the schema.

station_name: chararray

year: int

month: int

dayofmonth: int

precipitation: int

maxtemp: int

mintemp: int

This confirms that the DESCRIBE command ran successfully. Now, let us the output of DUMP command.

Pig Script Output
Pig Script Output

Above screenshot confirms that DUMP command is giving us the expected output.

This confirms that our script ran as expected and we got the intended result. This concludes the objective of this tutorial.

I hope the tutorials are making sense and helping you in terms of the concepts and the contents.

In the next tutorial, we are going to see how to load the data from a Hive table into a Pig Relation.

You can subscribe to my YouTube channel by clicking here for the video tutorials of HDPCD certification.

Cheers!

 

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s