Post 47 | HDPCD | Run a Hive query using Vectorization

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series.

In the last tutorial, we saw how to run a Hive Query using TeZ execution engine.

In this tutorial, we are going to see how to run a Hive Query using Vectorization.

Let us begin, then.

Before starting off with the objective of this tutorial, let us discuss why we use the Vectorization in Hive.

One of the most important reasons we enable Vectorization in Hive is to reduce the time that a complex query takes to produce an output. In HDP Sandbox, like the TeZ execution engine, it is enabled by default.

We will discuss the step-by-step process to perform this objective in the HPDCD certification series.

  • CHECKING THE DEFAULT VALUE OF THE HIVE VECTORIZATION

As soon as you log into the Hive session, you can see that Hive Vectorization is ENABLED by default.

We use the following command to check the value of the Hive Vectorization.

set hive.vectorized.execution.enabled;

The output of the above query is as follows.

Step 1: Checking the default value of the Hive Vectorization Mode
Step 1: Checking the default value of the Hive Vectorization Mode

As you can see from the above screenshot, you can see that Hive Vectorization is enabled by default in the Hive session.

Now, let us try to DISABLE it by setting the Hive Vectorization flag as FALSE.

  • SETTING THE HIVE VECTORIZATION TO FALSE

We use the following command to set the Hive Vectorization flag to FALSE.

set hive.vectorized.execution.enabled=false;

Once the Hive Vectorization flag is set to FALSE, we can use the above-mentioned command to check the new value of the Hive Vectorization.

set hive.vectorized.execution.enabled;

The output of the above commands is as follows.

Step 2 Setting the Hive Vectorization to FALSE
Step 2 Setting the Hive Vectorization to FALSE

The above screenshot indicates that the Hive Vectorization was set to FALSE.

Let us check what happens by disabling Hive Vectorization.

  • CHECKING THE EFFECT OF THE DISABLED HIVE VECTORIZATION

We are going to run a complex Hive query to check how much time it takes with DISABLED Hive Vectorization flag.

The complex Hive query is as follows.

select * from post41 order by id desc;

The output of the above query is as follows.

Step 3: Checking the effect of disabled Hive Vectorization
Step 3: Checking the effect of disabled Hive Vectorization

The above screenshot indicates that the query took around 70 seconds with Hive Vectorization flag set to FALSE.

Let us change this Hive Vectorization flag to TRUE, now.

  • SETTING THE HIVE VECTORIZATION TO TRUE

We use the following command to set the Hive Vectorization flag to TRUE.

set hive.vectorized.execution.enabled=true;

Once this command is set, we can check the Hive Vectorization flag value by running the following command.

set hive.vectorized.execution.enabled;

The output of the above commands is as follows.

Step 4: Setting the Hive Vectorization to TRUE
Step 4: Setting the Hive Vectorization to TRUE

The above screenshot shows that the value of Hive Vectorization flag was set to TRUE.

Let us run the same Hive complex query to check how much time it takes to produce the same output.

  • CHECKING THE EFFECT OF THE ENABLED HIVE VECTORIZATION

We are going to run the following Hive complex query, as we have already run with the Vectorization flag set to FALSE.

select * from post41 order by id desc;

The output of the above command is as follows.

Step 5: Checking the effect of enabled Hive Vectorization
Step 5: Checking the effect of enabled Hive Vectorization

As you can see from the above screenshot, the time it takes to produce the same output is around 25 seconds. By enabling the Hive Vectorization flag, we saved more than 50% of the time to run a complex Hive query.

This is one of the few examples to convey the importance of the Hive Vectorization.

We can conclude this tutorial here. In the next tutorial, we are going to see how to print the execution plan of a Hive query.

Till then, stay tuned and keep on sharing the contents.

We are not just four more posts away from completing all the tutorials.

I hope you guys like the content.

You can check out my LinkedIn profile here. Please like my Facebook page here. Follow me on Twitter here and subscribe to my YouTube channel here for the video tutorials.

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s