Friday, November 10, 2023

How to Compress MapReduce Job Output in Hadoop

You can choose to compress the output of a Map-Reduce job in Hadoop. You can configure to do it for all the jobs in a cluster or you can set properties for specific jobs.

Configuration parameters for compressing MapReduce job output

  • mapreduce.output.fileoutputformat.compress- Set this property to true if you want to compress the MapReduce job output. Default value is false.
  • mapreduce.output.fileoutputformat.compress.type- This configuration is applicable if your MapReduce job output is a sequence file. In that case you can specify any one of these value for compression- None, Record or Block. Default is Record.
  • mapreduce.output.fileoutputformat.compress.codec– Which codec is to be used for compression. Default is org.apache.hadoop.io.compress.DefaultCodec

Configuring at cluster level

If you want to compress output of all MapReduce jobs running on the cluster, then you can configure these parameters in mapred-site.xml.
As example- If you want to compress the output of MapReduce jobs and the compression format used is Gzip.

<property>
  <name>mapreduce.output.fileoutputformat.compress</name>
  <value>true</value>
</property>
<property>
  <name>mapreduce.output.fileoutputformat.compress.type</name>
  <value>RECORD</value>
</property>
<property>
  <name>mapreduce.output.fileoutputformat.compress.codec</name>
  <value>org.apache.hadoop.io.compress.GzipCodec</value>
</property>

Configuring at per-job basis

If you want to compress output of the specific MapReduce job then add the following properties in your job configuration.

 
FileOutputFormat.setCompressOutput(job, true);
FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
If output is a sequence file then you can set compression type too.
SequenceFileOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK);

That's all for this topic How to Compress MapReduce Job Output in Hadoop. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Hadoop Framework Tutorial Page


Related Topics

  1. How to Compress Intermediate Map Output in Hadoop
  2. Data Compression in Hadoop
  3. Compressing File in snappy Format in Hadoop - Java Program
  4. Word Count MapReduce Program in Hadoop
  5. How MapReduce Works in Hadoop

You may also like-

  1. Data Locality in Hadoop
  2. File Read in HDFS - Hadoop Framework Internal Steps
  3. File Write in HDFS - Hadoop Framework Internal Steps
  4. Uber Mode in Hadoop
  5. Ternary Operator in Java
  6. How to Create Password Protected Zip File in Java
  7. Compressing And Decompressing File in GZIP Format - Java Program
  8. How to Sort an ArrayList in Descending Order