Post 27 | HDPCD | Invoke a User Defined Function in Apache Pig

1

June 10, 2017 by milindjagre

Hello everyone, thanks for coming back to the last tutorial in the DATA TRANSFORMATION category of the HDPCD certification. We are going to pick-off things from the last tutorial, in which, we saw how to define an ALIAS to a function present in the JAR file. In this tutorial, we are going to see how to invoke/call that ALIAS to perform the desired operation.

Let us start then.

We are going to follow the process mentioned in the below infographics.

The big picture: Invoking UDF in Apache PIG

The big picture: Invoking UDF in Apache PIG

Please follow the below steps one by one.

  • CREATING INPUT CSV FILE IN LOCAL FILE SYSTEM

We can create the input CSV file using the VI Editor which we have done numerous times in the past.

I have uploaded this input CSV file to my GitHub profile in HDPCD repository with name 37_input_UDF_invoke.csv“. You can download this file by clicking here and it looks something like this.

You can use the following commands to create this input CSV file in local file system.

vi post27.csv

############################

PASTE THE COPIED CONTENTS HERE

############################

cat post27.csv

The following screenshot shows the execution of the above commands.

Step 1: creating input csv file in local file system

Step 1: creating input csv file in local file system

As the input CSV file is ready in the local file system, it is time to push it into HDFS.

  • PUSHING INPUT CSV FILE FROM LOCAL FILE SYSTEM TO HDFS

You can use the following commands to push this file from the local file system to HDFS.

hadoop fs -mkdir /hdpcd/input/post27
hadoop fs -put post27.csv /hdpcd/input/post27
hadoop fs -cat /hdpcd/input/post27/post27.csv

The output of the above commands is shown in the following screenshot.

Step 2: pushing input csv file from local to HDFS

Step 2: Pushing input csv file from local to HDFS

Above screenshot confirms that the post27.csv file was successfully created to the HDFS directory /hdpcd/input/post27.

Let us start working on the pig script creation now.

  • CREATING PIG SCRIPT TO INVOKE UDF

The objective of this pig script is to invoke the UDF present in one of the JAR files. Since we have to invoke the UDF, we first need to register the jar file. Define an ALIAS for the function and then invoke that ALIAS.

I have uploaded this input CSV file to my GitHub profile in HDPCD repository with name 38_UDF_invocation.pig“. You can download this file by clicking here and it looks something like this.

Let us go through this pig script one command at a time.

input_data = LOAD ‘/hdpcd/input/post27/post27.csv’ USING PigStorage(‘,’);

LOAD command loads the data stored in post27.csv file into a pig relation called input_data.

REGISTER /usr/hdp/2.3.0.0-2557/pig/piggybank.jar;

REGISTER command is used for registering the piggybank.jar file to the pig session.

DEFINE upper org.apache.pig.piggybank.evaluation.string.UPPER;

DEFINE command is used for defining an ALIAS for the fully qualified class name present in the REGISTERed jar file.

upper_data = FOREACH input_data GENERATE upper($1) as fname, upper($2) as lname;

The above command is used for invoking the ALIAS “upper” on columns column number 2 and 3 (index $1 and $2).

Now, let us create this pig script with the help of VI editor.

vi post27.pig

############################

PASTE THE COPIED CONTENTS HERE

############################

cat post27.pig

You can use the following screenshot as the reference for creating this pig script.

Step 3: creating pig script to demo UDF invocation

Step 3: creating pig script to demo UDF invocation

The above screenshot confirms that the pig script was created successfully.

Let us run this pig script now.

  • RUN PIG SCRIPT

We can use the following command to run this pig script.

pig post27.pig

The execution of this pig script looks as follows.

Step 4: Running pig script to invoke UDF

Step 4: Running pig script to invoke UDF

And the output of this pig script looks like this.

Step 4: Pig script execution output

Step 4: Pig script execution output

As you can see from the above screenshot, pig script was successful in execution. A total of 2 records were read from /hdpcd/input/post27 directory and 2 records were stored in /hdpcd/output/post27 directory.

It shows that there is no record/data loss, but we need to make sure that upper ALIAS worked perfectly and we got the expected output.

Let us log into HDFS for cross-checking the output records.

  • CHECK OUTPUT HDFS DIRECTORY

You can use the following commands to check the contents of the output HDFS directory.

hadoop fs -ls /hdpcd/output/post27
hadoop fs -cat /hdpcd/output/post27/part-m-00000

The execution of the above commands looks as follows.

Step 5: output HDFS directory contents

Step 5: output HDFS directory contents

As you can see from the above screenshot, both the first name and last name were printed in the UPPERCASE LETTERS which was the expected output of the upper ALIAS defined in the pig script.

This concludes the tutorial here. And with this tutorial, I am happy to announce that the DATA TRANSFORMATION section of the HDPCD certification is over and from the next tutorial onwards, we are going to start off with DATA ANALYSIS section, which is the last one in the HDPCD certification series.

I hope the contents are making sense and you are getting what I want to convey.

You can check out my website at www.milindjagre.com

You can check out my LinkedIn profile here. Please like my Facebook page here. Follow me on Twitter here and subscribe to my YouTube channel here for the video tutorials.

See you soon.

Cheers!

 

Advertisements

One thought on “Post 27 | HDPCD | Invoke a User Defined Function in Apache Pig

  1. […] of 24 posts, after which we will be finally done with the HDPCD certification tutorials. In the last tutorial of the DATA TRANSFORMATION section, we saw the process of invoking a UDF present in the […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: