Post 29 | HDPCD | Define a Hive-managed Table

Hello, everyone. Welcome to the second post in the Data Analysis section of the HDPCD certification series. In the last tutorial, we saw the three ways in which we run the hive commands. In this tutorial, we are going to create the hive-managed table i.e. hive internal table. For creating a hive-managed or internal table,ContinueContinue reading “Post 29 | HDPCD | Define a Hive-managed Table”

Post 28 | HDPCD | Write and Execute a Hive Query

Hello everyone, Welcome to the first tutorial of the DATA ANALYSIS section of the HDPCD certification. This section is going to contain a total of 24 posts, after which we will be finally done with the HDPCD certification tutorials. In the last tutorial of the DATA TRANSFORMATION section, we saw the process of invoking aContinueContinue reading “Post 28 | HDPCD | Write and Execute a Hive Query”

Post 27 | HDPCD | Invoke a User Defined Function in Apache Pig

Hello everyone, thanks for coming back to the last tutorial in the DATA TRANSFORMATION category of the HDPCD certification. We are going to pick-off things from the last tutorial, in which, we saw how to define an ALIAS to a function present in the JAR file. In this tutorial, we are going to see how toContinueContinue reading “Post 27 | HDPCD | Invoke a User Defined Function in Apache Pig”

Post 26 | HDPCD | Define an ALIAS for a User Defined Function

Hi, everyone. Thank you for returning again to this certification series. In the last tutorial, we saw the process of registering the jar file in the Apache PIG session. This tutorial is an extension to the previous one and in this, we are going to see how to define an alias for the UDF presentContinueContinue reading “Post 26 | HDPCD | Define an ALIAS for a User Defined Function”

Post 25 | HDPCD | Register a Jar file of UDF in Apache Pig

Hello, everyone. Thanks for coming back again to continue with this certification series. In the last tutorial, we saw how to run any pig script with TEZ as the execution mode. In this tutorial, we are going to see how to register a JAR file to use the User Defined Function written and packages inside it.ContinueContinue reading “Post 25 | HDPCD | Register a Jar file of UDF in Apache Pig”

Post 24 | HDPCD | Run a Pig job using TEZ

Hey, everyone. Thank you for giving me company on this beautiful journey of HDPCD certification. We are almost done with the Data Transformation section of the certification and are only left with Data Analysis section using Apache Hive. The section of Data Analysis, in my opinion, is easier than this section so you can sayContinueContinue reading “Post 24 | HDPCD | Run a Pig job using TEZ”

Post 23 | HDPCD | Perform a REPLICATED JOIN using Apache Pig

Hey everyone, thank you once again for keep on coming back to perform these tutorials. In the last tutorial, we saw how to perform the simple JOIN Operation and in this tutorial, we are going to perform the REPLICATED JOIN Operation.  The process is similar and there is a difference only at one place, soContinueContinue reading “Post 23 | HDPCD | Perform a REPLICATED JOIN using Apache Pig”

Post 1 | Machine Learning | Introduction

Hello, people. In this new tutorial series, we are going to talk about the different aspects of the Machine Learning. As an aspiring Data Scientist, I always wanted to get my hands dirty with the concepts of Machine Learning and the Summar Break gave me exactly what I wanted – “TIME TO LEARN MACHINE LEARNINGContinueContinue reading “Post 1 | Machine Learning | Introduction”

Post 22 | HDPCD | Join two datasets using Apache Pig

Hey, everyone. Thanks for the overwhelming response to the blog posts that I am receiving since the last week. I really appreciate it. I will keep on posting interesting and innovative contents for you. In the last tutorial, we saw how to use the parallel features of Apache Pig in two ways. In this tutorial,ContinueContinue reading “Post 22 | HDPCD | Join two datasets using Apache Pig”

Post 19 | HDPCD | Sort the output of a Pig Relation

Hi everyone, thanks for coming back again to continue with this tutorial series. We are almost there with this section, and once we are done with this, we will jump into Hive, which will not take much time. In the last tutorial, we saw the process to store the data from PIG to HIVE usingContinueContinue reading “Post 19 | HDPCD | Sort the output of a Pig Relation”