Post 21 | HDPCD| Specify number of reduce tasks for Pig MapReduce job

Hello everyone. Thanks for coming back to one more tutorial in this HDPCD certification series. In the last tutorial, we saw how to remove the duplicate tuples from a pig relation. In this tutorial, we are going to see how to specify the number of reduce tasks for a Pig MapReduce job. Let us get started […]

Read Excel File using MapReduce

The below code is used for reading excel files using MapReduce API. Entire source code has been taken from this link.   ExcelDriver.java https://gist.github.com/milindjagre/84cc1c230ffd10b7ec0b5db5a47f4c80 ExcelInputFormat.java https://gist.github.com/milindjagre/a8f2f35908ad0ce2d0d63a0725e3de4f ExcelMapper.java https://gist.github.com/milindjagre/3a77b5430111ead3eb3f538a5db72210 ExcelParser.java https://gist.github.com/milindjagre/34966d289da2e6d33dfbf0f76fc75271 ExcelRecordReader.java https://gist.github.com/milindjagre/d45935abc259d594e1ed495ca2a67d7a pom.xml https://gist.github.com/milindjagre/f95e366cf4766070652608c05783be0f If you clean and build above project, it will create two jar files, out of which we have to use the jar file … Continue reading Read Excel File using MapReduce