Thursday, October 12, 2023

Java Program to Read File in HDFS

In this post we’ll see a Java program to read a file in HDFS. You can read a file in HDFS in two ways-

  1. Create an object of FSDataInputStream and use that object to read data from file. See example.
  2. You can use IOUtils class provided by Hadoop framework. See example.

Reading HDFS file Using FSDataInputStream

import java.io.IOException;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class HDFSFileRead {

 public static void main(String[] args) {
  Configuration conf = new Configuration();
  FSDataInputStream in = null;
  OutputStream out = null;
  try {
   FileSystem fs = FileSystem.get(conf);
   // Input file path
   Path inFile = new Path(args[0]);
     
   // Check if file exists at the given location
   if (!fs.exists(inFile)) {
    System.out.println("Input file not found");
    throw new IOException("Input file not found");
   }
   // open and read from file
   in = fs.open(inFile);
   //displaying file content on terminal 
   out = System.out;
   byte buffer[] = new byte[256];
  
   int bytesRead = 0;
   while ((bytesRead = in.read(buffer)) > 0) {
    out.write(buffer, 0, bytesRead);
   }      
  } catch (IOException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }finally {
   // Closing streams
   try {
    if(in != null) {     
     in.close();    
    }
    if(out != null) {
     out.close();
    }
   } catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
   }
  }
 }
}

In order to execute this program you need to add the class path to the Hadoop’s classpath.

export HADOOP_CLASSPATH=<PATH TO .class FILE>

To run program- hadoop org.netjs.HDFSFileRead /user/process/display.txt

Reading HDFS file Using IOUtils class

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

public class HDFSFileRead {

 public static void main(String[] args) {
  Configuration conf = new Configuration();
  FSDataInputStream in = null;
  //OutputStream out = null;
  try {
   FileSystem fs = FileSystem.get(conf);
   // Input file path
   Path inFile = new Path(args[0]);
     
   // Check if file exists at the given location
   if (!fs.exists(inFile)) {
    System.out.println("Input file not found");
    throw new IOException("Input file not found");
   }
   in = fs.open(inFile);
   
   IOUtils.copyBytes(in, System.out, 512, false);
  } catch (IOException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }finally {
   IOUtils.closeStream(in);
  }
 }
}

That's all for this topic Java Program to Read File in HDFS. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Hadoop Framework Tutorial Page


Related Topics

  1. Java Program to Write File in HDFS
  2. File Read in HDFS - Hadoop Framework Internal Steps
  3. HDFS Commands Reference List
  4. What is SafeMode in Hadoop
  5. HDFS Federation in Hadoop Framework

You may also like-

  1. What is Big Data
  2. Installing Hadoop on a Single Node Cluster in Pseudo-Distributed Mode
  3. Word Count MapReduce Program in Hadoop
  4. How MapReduce Works in Hadoop
  5. How to Compress MapReduce Job Output in Hadoop
  6. How to Configure And Use LZO Compression in Hadoop
  7. Reading File in Java Using Files.lines And Files.newBufferedReader
  8. StringBuilder Class in Java With Examples