Hello everyone, hope you are finding the tutorials useful. In the previous tutorial, we started off with Data Transformation category of the HDPCD certification. This tutorial, being the second objective in this category, focuses on creating a sample pig relation without the schema. Before, starting with the actual process, let us define what is relation and schema in Apache Pig.
Pig Relation: In simplest terminologies, a relation in Apache Pig is equivalent to the table in Relational Databases. A relation in Apache Pig contains data which is loaded from the file available in either the local file system or HDFS. While loading the Pig Relation with data, it is up to you to define the schema or not. If you do not define the schema, then it will create the Pig Relation with the default schema, which we are going to see as this tutorial’s objective.
Pig Schema: A pig schema defines the name of the field and the datatype of each field in the Pig Relation. It is up to you to define the name and datatype of each field while defining the schema. All of these column names and datatypes collectively make up a schema. I know I am reiterating this, but if you do not define the schema of a Pig Relation, then Pig will automatically define the default field name and data type, as we will see in just a few minutes.
Let us get started, then.
- CREATING INPUT CSV FILE IN LOCAL FILE SYSTEM
We are going to use vi editor to create this input file.
PASTE COPIED CONTENTS HERE
The following screenshot gives you more idea.
- PUSHING INPUT CSV FILE TO HDFS
Please use the following commands to push this input.csv from the local file system to HDFS.
hadoop fs -mkdir /hdpcd/input/post10
hadoop fs -put input.csv /hdpcd/input/post10
hadoop fs -cat /hdpcd/input/post10/input.csv
The following screenshot might come handy for this.
Now is the time to create the pig script.
- PIG SCRIPT CREATION
Please use the following command to create this pig script.
PASTE THE CONTENTS HERE
The following screenshot helps you understand this.
- RUNNING PIG SCRIPT
The following command is used for running this pig script.
pig -f post10.pig
It looks as follows.
And the output of the pig script.
This concludes the tutorial.