Quantcast
Channel: Shekhar Govindarajan's Blog » Open Source
Viewing all articles
Browse latest Browse all 13

Connect and query Cloudera Impala using PHP ODBC on CentOS 7

$
0
0

Cloudera Impala is a SQL query engine for Hadoop. Impala is supposed to be better suited for real-time SQL queries, compared to MapReduce-based, batch processing software like Hive or Pig. The former is not dependent on MapReduce. In one of the ongoing projects which is running Impala on Hadoop, I had to configure connecting to it via PHP – so that the web developers can start using it.

This blog post explains to query Impala using PHP and ODBC. What’s more, Cloudera (the company behind Impala) provides an RPM for ODBC drivers for Impala. Following is how to install, configure and use Impala ODBC drivers with PHP (version 5.4) on CentOS 7 (64bit) Linux distribution. 

Download the Impala ODBC drivers from http://www.cloudera.com/content/cloudera/en/downloads/connectors/impala/odbc/impala-odbc-v2-5-29.html. On this page, click on “Download Bits” button against Linux RHEL6 – 64 Bit. After filling up the popup form, the RPM will start downloading. As of this writing the filename of the RPM is ClouderaImpalaODBC-2.5.29.1009-1.el6.x86_64.rpm.

Install this RPM on your PHP powered web server as:

rpm -ivh ClouderaImpalaODBC-2.5.29.1009-1.el6.x86_64.rpm

On CentOS 7, the above command will fail with the following error:

cyrus-sasl-gssapi >= 2.1.22 is needed by ClouderaImpalaODBC-2.5.29.1009-1.x86_64
cyrus-sasl-plain >= 2.1.22 is needed by ClouderaImpalaODBC-2.5.29.1009-1.x86_64
libsasl2.so.2()(64bit) is needed by ClouderaImpalaODBC-2.5.29.1009-1.x86_64

Now install  cyrus-sasl-gssapi and cyrus-sasl-plain as follows:

yum install cyrus-sasl-gssapi cyrus-sasl-plain

Now again issue:

rpm -ivh ClouderaImpalaODBC-2.5.29.1009-1.el6.x86_64.rpm

This should now leave you with only one error:

libsasl2.so.2()(64bit) is needed by ClouderaImpalaODBC-2.5.29.1009-1.x86_64

Next, install the rpm with –nodeps option as:

rpm -ivh ClouderaImpalaODBC-2.5.29.1009-1.el6.x86_64.rpm --nodeps

To fix the libsasl2.so.2, issue the following command:

ln  -s  /usr/lib64/libsasl2.so.3  /usr/lib64/libsasl2.so.2

Install unixODBC as follows:

yum install unixODBC

Configure the ODBC
Copy the files odbc.ini and odbcinst.ini, found in /opt/cloudera/impalaodbc/Setup, to /etc directory (overwrite the existing files). Next, open the file named cloudera.impalaodbc.ini, found in the directory /opt/cloudera/impalaodbc/lib/64, in a text editor. Comment out (by prefixing a #) the line which says ODBCInstLib=libiodbcinst.so as follows:

#ODBCInstLib=libiodbcinst.so

Next, add the following line towards the end of the file:

ODBCInstLib=libodbcinst.so

Save the file.

The PHP side
Install php-odbc as:

yum install php-odbc

Reload Apache web server as:

service httpd reload

Now the following PHP code should work and query data in Impala via ODBC:

<?php

$dsn = "DSN=Sample Cloudera Impala DSN 64;host=192.168.0.5;port=21050;database=bigdata;";

$conn = odbc_connect($dsn, '', '');
$result = odbc_exec($conn, "select * from tbl limit 10");

while($row = odbc_fetch_array($result))
print_r($row);

?>

In the above code, substitute 192.168.0.5 to the name or IP address of the machine running Impala datanode. If you point it to the namenode, you will get the following error:

PHP Warning:  odbc_connect(): SQL error: [unixODBC][Cloudera][ImpalaODBC] (100) Error from the Impala Thrift API: 
connect() failed: Connection refused, SQL state S1000 in SQLConnect

Substitute bigdata in the DSN with the name of the database. And, substitute the name of the table (tbl) in the query  select * from tbl limit 10.

This blog post is aimed to get you started with PHP ODBC and Impala with the default configuration, with minimal changes. So feel free to change the ugly DSN name “Sample Cloudera Impala DSN 64″ in /etc/odbcinst.ini, /etc/odbc.ini and in the PHP script.

Share


Viewing all articles
Browse latest Browse all 13

Latest Images

Trending Articles





Latest Images