Post 7 | ML | Data Preprocessing – Part 5

Hello, everyone. Thanks for joining me in this 5th tutorial of the Data Preprocessing part of the Machine Learning tutorials. In the last tutorial, we saw how to convert the CATEGORICAL VARIABLES from the STRING format to an INTEGER format. In this tutorial, we are going a step ahead and are going to split the original dataContinueContinue reading “Post 7 | ML | Data Preprocessing – Part 5”

Post 52 | HDPCD | The conclusion

Hi everyone. Finally, we have reached the end of this tutorial series. It’s been so long. We started this journey together on January 15th, 2017, and, 276 days later this beautiful journey is coming to an end. But, we do not need to worry, because, I am working on something new and would love toContinueContinue reading “Post 52 | HDPCD | The conclusion”

Post 51 | HDPCD | Set Hadoop or Hive Configuration property

Hello, everyone. Welcome to the last technical tutorial in the HDPCD certification series. It’s funny! This beautiful journey is coming to an end. In the last tutorial, we saw how to sort the output of a Hive query across multiple reducers. In this tutorial, we are going to see how to set a Hadoop or Hive configurationContinueContinue reading “Post 51 | HDPCD | Set Hadoop or Hive Configuration property”

Post 50 | HDPCD | Order Hive query output across multiple reducers

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the last tutorial, we saw how to enable vectorization in Hive. In this tutorial, we are going to see how to run a subquery within a Hive query. Let us begin, then. The following infographics show the step-by-step process of performing this operation. FromContinueContinue reading “Post 50 | HDPCD | Order Hive query output across multiple reducers”

Post 48 | HDPCD | Printing the execution plan of a Hive query

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the last tutorial, we saw how to enable vectorization in Hive. In this tutorial, we are going to see how to print the execution plan of a Hive query. Let us begin, then. This is one of the simplest tutorials in this certification series. InContinueContinue reading “Post 48 | HDPCD | Printing the execution plan of a Hive query”

Post 43 | HDPCD | Delete a row in a Hive table

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series. In the last tutorial, we saw how to update a row in a Hive table. In this tutorial, we are going to see how to delete a row in the Hive table. It is quite interesting to see that Hive supports ACID operationsContinueContinue reading “Post 43 | HDPCD | Delete a row in a Hive table”

Post 39 | HDPCD | Load data into a Hive table from an HDFS directory

Hello, everyone. Thanks for returning for the next tutorial in the HDPCD certification series. In the last tutorial, we saw how to load data into a Hive table from a local directory. In this tutorial, we are going to see how to load the data from the local Directory into the Hive table. Let us begin then.ContinueContinue reading “Post 39 | HDPCD | Load data into a Hive table from an HDFS directory”

Post 37 | HDPCD | Specifying delimiter of a Hive table

Hello, everyone. Thanks for coming back for one more tutorial in this HDPCD certification series. In the last tutorial, we saw how to specify the storage format of a Hive table. In this tutorial, we are going to see how to specify the delimiter of a Hive table. We are going to follow the processContinueContinue reading “Post 37 | HDPCD | Specifying delimiter of a Hive table”

Post 34 | HDPCD | Defining Hive Table using an ORC File Format

Hi, everyone. Thanks for joining me today for this tutorial. In the last tutorial, we saw how to create a hive table using the SELECT query. In this tutorial, we are going to see how to create a hive table which stores the data in the ORC File Format. The process of creating this tableContinueContinue reading “Post 34 | HDPCD | Defining Hive Table using an ORC File Format”

Post 30 | HDPCD | Define a Hive External Table

Hello, everyone! Welcome to the third tutorial in the Data Analysis section of the HDPCD certification. In the last tutorial, we saw how to create the hive-managed or internal table. In this tutorial, we are going to create the hive external table. So, let us start with the process. The following infographics show the processContinueContinue reading “Post 30 | HDPCD | Define a Hive External Table”