How to Read HDFS File in Java

Hadoop distributed file system (HDFS) can be accessed using native Java API provided by hadoop Java library. The following example uses FileSystem API to read an existing file in an hdfs folder. Before running the following Java program, ensure that the following values are changed as per your hadoop installation.

  • Modify the HDFS_ROOT_URL to point to the hadoop IPC endpoint. This can be copied from the etc/hadoop/core-site.xml file.
  • Modify the hdfs file path used in the program. The following program prints the file input.txt located in /user/jj hdfs folder. The default hdfs home folder is named as /user/<username>. Ensure that a file is already uploaded to the hdfs folder. To copy input.txt from your hadoop folder to the dfs You can use the command "bin/hadoop dfs -copyFromLocal input.txt .".

Prerequisites

  • Java 1.8+
  • Gradle 3.x+
  • Hadoop 2.x

How to Read an HDFS File Using Gradle Java Project

Step 1: Create a simple gradle java project using the following command. This assumes that gradle is already installed on your system.

gradle init --type java-application

Step 2: Replace the file build.gradle with the following,

apply plugin: 'java-library'
apply plugin: 'application'

mainClassName = "HDFSDemo"

jar {
	manifest {
		attributes 'Main-Class': "$mainClassName"
	}
}
repositories {
    jcenter()
}

dependencies {
	compile 'org.apache.hadoop:hadoop-client:2.7.3'
}

Note the dependency on hadoop 2.7.3. Update this value if you are working with a different hadoop server version.

Step 3: Add the Java class HDFSDemo.java to the src/main/java folder. Delete App.java and AppTest.java from the project folder.

import java.io.InputStream;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

// Sample Java program to read files from hadoop hdfs filesystem
public class HDFSDemo {

	// This is copied from the entry in core-site.xml for the property fs.defaultFS. 
	// Replace with your Hadoop deployment details.
	public static final String HDFS_ROOT_URL="hdfs://localhost:9000";
	private Configuration conf;

	public static void main(String[] args) throws Exception {
		HDFSDemo demo = new HDFSDemo();
		
		// Reads a file from the user's home directory.
		// Replace jj with the name of your folder
		// Assumes that input.txt is already in HDFS folder
		String uri = HDFS_ROOT_URL+"/user/jj/input.txt";
		demo.printHDFSFileContents(uri);
	}
	
	public HDFSDemo() {
		conf = new Configuration();
	}
	
	// Example - Print hdfs file contents to console using Java
	public void printHDFSFileContents(String uri) throws Exception {
		FileSystem fs = FileSystem.get(URI.create(uri), conf);
		InputStream in = null;
		try {
			in = fs.open(new Path(uri));
			IOUtils.copyBytes(in, System.out, 4096, false);
		} finally {
			IOUtils.closeStream(in);
		}
	}

}

Step 4: Build and run the application using the gradle wrapper command below. The contents of the hadoop hdfs file will be printed on the console.

./gradlew run