HDFS
Connect to an HDFS server
In the Big Data Tools window, click and select HDFS.
In the Big Data Tools dialog that opens, specify the connection parameters:
Name: the name of the connection to distinguish it between the other connections.
Root path: a path on the target server to be the root for HDFS connection.
When the connection is successfully established, the Driver home path field shows the target IP address of connection including a port number. Example: hdfs://127.0.0.1:65224/.
Configuration source: select one of:
Configuration files directory: a path to the directory with the HDFS configuration files. See the samples of configuration files.
File system URI: URI of an HDFS server.
Optionally, you can set up:
Per project: select to enable these connection settings only for the current project. Deselect it if you want this connection to be visible in other projects.
Enable connection: deselect if you want to restrict using this connection. By default, the newly created connections are enabled.
Username: enter a username to log in to the server. If not specified, the
HADOOP_USER_NAME
environment variable is used. If this variable is not defined, theuser.name
property is used. If Kerberos is enabled, it overrides any of these three values.Enable tunneling (Only NameNode operation). Creates an SSH tunnel to the remote host. It can be useful if the target server is in a private network but an SSH connection to the host in the network is available. SSH tunneling currently works only for operators with the following name nodes: list files, get meta info
Select the checkbox and specify a configuration of an SSH connection (click ... to create a new SSH configuration).
Once you fill in the settings, click Test connection to ensure that all configuration parameters are correct. Then click OK.
Samples of Hadoop File System configuration files
Type | Sample configuration |
---|---|
HDFS |
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://example.com:9000/</value>
</property>
</configuration>
|
S3 |
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
<property>
<name>fs.s3a.access.key</name>
<value>sample_access_key</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>sample_secret_key</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>s3a://example.com/</value>
</property>
</configuration>
|
WebHDFS |
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.webhdfs.impl</name>
<value>org.apache.hadoop.hdfs.web.WebHdfsFileSystem</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>webhdfs://master.example.com:50070/</value>
</property>
</configuration>
|
WebHDFS and Kerberos |
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.webhdfs.impl</name>
<value>org.apache.hadoop.hdfs.web.WebHdfsFileSystem</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>webhdfs://master.example.com:50070</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>Kerberos</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>testuser@EXAMPLE.COM</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
</configuration>
|