In this tutorial, we are going to install and configure Apache Drill 1.3.0 on Ubuntu 16.04.
But, before starting with the installation and configuration, let us get to know about Apache Drill.
Following is the minimum information we should know before going ahead with Apache Drill Installation and Configuration.
- Drill converts CSV files, NoSQL Databases on-the-fly to query them using SQL.
- It reduces most of the overhead for extracting analysis out of the data and enables users to focus more on important and things that matter.
- It is a revolutionary project by Apache, enabling millions of users to ping files and NoSQL tables to extract useful analysis.
Now, let us dive into the installation document we are going to follow to install Apache Drill 1.3.0.
Please go through below document which I have uploaded on my GitHub profile.
As it is mentioned in the closing about the process running for Apache Drill – SqlLine. The following screenshot depicts the same thing. Please have a look.
Now it is to log into Apache Drill.
There are two ways in which you can do this. The first one is command prompt approach and the second and the recommended one is through the web interface.
We will see each approach one by one.
- COMMAND PROMPT
We use the following command to log into Drill Command Prompt.
$ sqlline -u jdbc:drill:zk=local
Following screenshot shows you the expected output of above command.
2. WEB INTERFACE
You can put following address in your browser window to access the Apache Drill Web Interface. This is the most preferred way of accessing drill.
The address pattern is as follows.
Now, there are plenty of ways to find out the IP Addresses. I am showing the most convenient way as below.
You can run command $hostname -I to get the IP Address. Below screenshot gives you the expected output.
You take this IP Address and insert it into the web browser. Voila, you will get the Apache Drill Web Interface. It looks something like this.
Now, before verifying whether drill works fine or not. Let us take a close look at the tabs present shown in above screenshot.
The Query tab is the one on which we are going to write our SQL queries to get the useful information out of unstructured data.
It looks something like this.
The Storage tab is the one which shows the various storage types and nodes available for use. We are going to use dfs storage type which enables us to access any file loaded into the system.
It looks something like this.
Now, it is time to test our Drill Interface.
For that, we are going to use below JSON file.
Our job is to query the above data with the help of drill interface.
For that, we are going to follow below steps.
- Load this file in your system.
- Go to Apache Drill Query Web Interface.
- Write desired query in the query window.
- Hit on Submit
- Observe the output
Below screenshot depicts this step-by-step process.
I am using FileZilla client to copy above mentioned file from my Windows system to the Ubuntu system on which drill is installed.
Note that the complete path of the input file is ‘/home/hduser/drill_input.json‘
As you can see from above screenshot, I am running a simple select * query to get the output from the input JSON file which we uploaded a few moments ago.
I think, this explains how the drill works quite clearly. It took the file from the path mentioned in SQL query and on-the-fly converted into a table to give us the result in tabular format as shown above.
The last thing that we should always do is to close the Apache Drill connection. This can be done from the command prompt !quit command. Do not forget the preceding exclamation symbol.
Below screenshot might be helpful.
This completed Apache Drill Installation and Configuration.
Hope you people had a good read.