In this post series, I am going to talk about the Hortonworks Data Platform Certified Developer Certification, also known as HDPCD.
We’ll kick-off the proceedings with the introduction to this certification exam and the things need to be covered in order to earn a verified digital badge, proving your certification. Therefore, few facts about this certification are as follows.
Cost: $250 (enough motivation to take this exam seriously)
Duration: 2 hours
Passing Criteria: Solve either 5 out of 7 or 7 out 10 questions successfully.
Hadoop Sections to study: HDFS, Sqoop, Flume, Hive and Pig
Now, above mentioned five sections are divided into three major categories in HDPCD certification. These categories are as follows.
- DATA INGESTION
- DATA TRANSFORMATION
- DATA ANALYSIS
Above mentioned categories contain specific tasks which you should be familiar with to clear the certification with flying colors.
We will see tasks in each category. These tasks are as follows.
This category contains following six tasks which you should be aware of.
- SQOOP Import.
- Free form SQOOP import.
- Importing data into Hive using SQOOP.
- SQOOP Export.
- FLUME Agent.
- FLUME Memory Channel.
The second category in this certification is focused entirely on Apache Pig, therefore all of the below tasks are related to Apache Pig.
- Write and Execute a PIG Script.
- Load data into PIG relation without a schema.
- Load data into PIG relation with a schema.
- Load data from hive table into a PIG relation.
- Use PIG to transform data into specified format.
- Transform PIG data to match a given hive schema.
- Group the data of one or more PIG relation(s).
- Use PIG to remove records with NULL values from a relation.
- Store the data from a PIG relation into a folder in HDFS.
- Store the data from a PIG relation into a hive table.
- Sort the output of a PIG relation.
- Remove the duplicate tuples of a PIG relation.
- Specify the number of reduce tasks for a PIG MapReduce Job.
- Join the two datasets using PIG.
- Perform a replicate join using PIG.
- Run a PIG job using TeZ.
- Within a PIG script, register a JAR using UDF.
- Within a PIG script, define an alias for the UDF.
- Within a PIG script, invoke a UDF.
As above category is dedicated to Apache Pig, this category is entirely dedicated to Apache Hive. It contains following subtasks.
- Write and execute HIVE query.
- Define a HIVE-managed table.
- Define a HIVE external table.
- Define a partitioned HIVE table.
- Define a bucketed HIVE table.
- Define a HIVE table from a select query.
- Define a HIVE table that uses ORCFile format.
- Create a new ORCFile table from the existing data in a non-ORCFile table in HIVE.
- Specify the storage format of a HIVE table.
- Specify the delimiter of a HIVE table.
- Load data into a HIVE table from a local directory.
- Load data into a HIVE table from an HDFS directory.
- Load data into a HIVE table as the result of the query.
- Load compressed data into a HIVE table.
- Update a row in a HIVE table.
- Delete a row in a HIVE table.
- Insert a row in a HIVE table.
- Join two HIVE tables.
- Run a HIVE query using Tez.
- Run a HIVE query using vectorization.
- Output the execution plan for a HIVE query.
- Use a subquery within a HIVE query
- Output data from a HIVE query that is totally ordered across multiple reducers.
- Set a Hadoop or HIVE configuration property from within a Hive query.
Though these tasks seem too much, but if done regularly, will hardly take any time of your schedule.
We will cover each task in each post on this blog, therefore at the end, we are going to have a total of 51 posts, including this post, 49 tasks, and one conclusive post.
I hope this series will help the HDPCD certification aspirants.
The link for the certification is HDPCD Certification
The certification objectives are taken from HDPCD Objectives
Suggestions are welcome.