Spark + Python : reduce action

This tutorial is sort of an introduction to the action in spark.

We have seen transformations like map() and flatMap() till now. reduce is one of the actions provided by spark.

In this, we are going to perform an addition operation with the help of reduce action.

We are going to follow below steps for achieving this.

reduce : addition of numbers
reduce : addition of numbers

 

we follow above-mentioned steps to perform the addition operation with the help of the reduce action.

We take input as series/list of numbers, parallelize it, and then perform the reduce action as an add function.

Following code uploaded on github explains this approach.

Below screenshot shows the code written in Notepad++.

reduce action in Notepad++
reduce action in Notepad++

As we have seen in previous tutorials, we run above code with the help of spark-submit command.

Below screenshot shows the command used for running this code.

running reduce.py
running reduce.py

Once you run above command, you get the output shown in below screenshot.

output
output

Above screenshot shows us the output of the code we ran, summation of number 1, 2, 3, 4 and 5 which comes up to 15.

This is the way we implement reduce action.

Hope this helps.

Cheers!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s