Thursday, June 7, 2018

Java Program to Read File in HDFS

In this post we’ll see a Java program to read a file in HDFS. You can read a file in HDFS in two ways-

  1. Create an object of FSDataInputStream and use that object to read data from file. See example.
  2. You can use IOUtils class provided by Hadoop framework. See example.

Reading HDFS file Using FSDataInputStream

import java.io.IOException;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class HDFSFileRead {

 public static void main(String[] args) {
  Configuration conf = new Configuration();
  FSDataInputStream in = null;
  OutputStream out = null;
  try {
   FileSystem fs = FileSystem.get(conf);
   // Input file path
   Path inFile = new Path(args[0]);
     
   // Check if file exists at the given location
   if (!fs.exists(inFile)) {
    System.out.println("Input file not found");
    throw new IOException("Input file not found");
   }
   // open and read from file
   in = fs.open(inFile);
   //displaying file content on terminal 
   out = System.out;
   byte buffer[] = new byte[256];
  
   int bytesRead = 0;
   while ((bytesRead = in.read(buffer)) > 0) {
    out.write(buffer, 0, bytesRead);
   }      
  } catch (IOException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }finally {
   // Closing streams
   try {
    if(in != null) {     
     in.close();    
    }
    if(out != null) {
     out.close();
    }
   } catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
   }
  }
 }
}

In order to execute this program you need to add the class path to the Hadoop’s classpath.

export HADOOP_CLASSPATH=<PATH TO .class FILE>

To run program- hadoop org.netjs.HDFSFileRead /user/process/display.txt

Reading HDFS file Using IOUtils class

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

public class HDFSFileRead {

 public static void main(String[] args) {
  Configuration conf = new Configuration();
  FSDataInputStream in = null;
  //OutputStream out = null;
  try {
   FileSystem fs = FileSystem.get(conf);
   // Input file path
   Path inFile = new Path(args[0]);
     
   // Check if file exists at the given location
   if (!fs.exists(inFile)) {
    System.out.println("Input file not found");
    throw new IOException("Input file not found");
   }
   in = fs.open(inFile);
   
   IOUtils.copyBytes(in, System.out, 512, false);
  } catch (IOException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }finally {
   IOUtils.closeStream(in);
  }
 }
}

Recommendations for learning

  1. The Ultimate Hands-On Hadoop
  2. Hive to ADVANCE Hive (Real time usage)
  3. Spark and Python for Big Data with PySpark
  4. Python for Data Science and Machine Learning
  5. Java Programming Masterclass Course

That's all for this topic Java Program to Read File in HDFS. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Hadoop Framework Tutorial Page


Related Topics

  1. Java Program to Write File in HDFS
  2. File Read in HDFS - Hadoop Framework Internal Steps
  3. HDFS Commands Reference List
  4. What is SafeMode in Hadoop
  5. HDFS Federation in Hadoop Framework

You may also like-

  1. What is Big Data
  2. Installing Hadoop on a Single Node Cluster in Pseudo-Distributed Mode
  3. Word Count MapReduce Program in Hadoop
  4. How MapReduce Works in Hadoop
  5. How to Compress MapReduce Job Output in Hadoop
  6. How to Configure And Use LZO Compression in Hadoop
  7. Reading File in Java 8
  8. StringBuilder in Java