Thursday, February 18, 2021

Creating PDF in Java Using Apache PDFBox

In the post Creating PDF in Java Using iText we have already seen how to use iText library to generate a PDF in Java, we have already seen one alternative of iText which is OpenPDF for generating PDF. In this tutorial we’ll learn about another option for generating PDF in Java using Apache PDFBox.

PDFBox for creating PDF in Java

The Apache PDFBox library (https://pdfbox.apache.org/) is an open source tool written in Java for working with PDF documents. Using PDFBox you can create new PDF documents, manipulate existing documents and extract content from PDF documents. Apache PDFBox also includes several command-line utilities. Apache PDFBox is published under the Apache License v2.0.


Maven dependency for Apache PDFBox

You need to add following dependency for PDFBox.

<dependency>
  <groupId>org.apache.pdfbox</groupId>
  <artifactId>pdfbox</artifactId>
  <version>2.0.13</version>
</dependency>

Classes in PDFBox library

Some of the classes which you'll be using for PDF generation using PDFBox.

  1. PDDocument- This is the in-memory representation of the PDF document. Using constructor of this class you can create an empty PDF document or use an existing document.The close() method must be called once the document is no longer needed.
  2. PDPage- This class instance represents a page in a PDF document. Page should be added to the document using addPage( method of the PDDocument class.
  3. PDPageContentStream- Provides the ability to write to a page content stream.
  4. PDImageXObject- Represents an image in a PDF document.

Creating PDF in Java using PDFBox – Hello World

First lets see a simple Java program where “Hello world” is written to the PDF using PDFBox library. This example also shows how to set font and text color for the content written to PDF using PDFBox.

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType1Font;

public class PDFGenerator {
  public static final String DEST = "F:\\NETJS\\Test\\HelloWorld.pdf";
  public static void main(String[] args) {
    try {
      PDDocument pdDoc = new PDDocument();
      PDPage page = new PDPage();
      // add page to the document
      pdDoc.addPage(page);
      // write to a page content stream
      try(PDPageContentStream cs = new PDPageContentStream(pdDoc, page)){
        cs.beginText();
        // setting font family and font size
        cs.setFont(PDType1Font.HELVETICA, 14);
        // Text color in PDF
        cs.setNonStrokingColor(Color.BLUE);
        // set offset from where content starts in PDF
        cs.newLineAtOffset(20, 750);
        cs.showText("Hello! This PDF is created using PDFBox");
        cs.newLine();
        cs.endText();
    }
      // save and close PDF document
      pdDoc.save(DEST);
      pdDoc.close();
    } catch(IOException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
  }
}
HelloWorld PDF using PDFBox

Adding content to an existing PDF using PDFBox - Java Program

If you want to add a new page to an existing PDF then you can load the existing PDF and add page to it.

import java.awt.Color;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType1Font;

public class PDFGenerator {
  public static final String DEST = "F:\\NETJS\\Test\\HelloWorld.pdf";
  public static void main(String[] args) {
    try {
      // load exiting PDF
      PDDocument pdDoc = PDDocument.load(new File(DEST));
      PDPage page = new PDPage();
      // add page to the document
      pdDoc.addPage(page);
      // write to a page content stream
      try(PDPageContentStream cs = new PDPageContentStream(pdDoc, page)){
        cs.beginText();
        // setting font family and font size
        cs.setFont(PDType1Font.HELVETICA, 14);
        // Text color in PDF
        cs.setNonStrokingColor(Color.BLUE);
        // set offset from where content starts in PDF
        cs.newLineAtOffset(20, 750);
        cs.showText("Adding content to an existing PDF");
        cs.newLine();
        cs.endText();
      }
      // save and close PDF document
      pdDoc.save(new File("F:\\NETJS\\Test\\Changed.pdf"));
      pdDoc.close();
    } catch(IOException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
  }
}

Adding text spanning multiple lines in PDF using PDFBox - Java Program

If you have text that may span multiple lines in PDF then you do need to write the logic to divide that text into multiple lines as per the width of the document. In PDFBox there is no such support and if you add long text directly then it will be written in PDF as a single line.

import java.awt.Color;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType1Font;

public class PDFGenerator {
  public static final String DEST = "F:\\NETJS\\Test\\MultiLine.pdf";
  public static void main(String[] args) {
    try {
      PDDocument pdDoc = new PDDocument();
      PDPage page = new PDPage();
      // add page to the document
      pdDoc.addPage(page);
      // write to a page content stream
      try(PDPageContentStream cs = new PDPageContentStream(pdDoc, page)){
        cs.beginText();
        // setting font family and font size
        cs.setFont(PDType1Font.HELVETICA, 14);
        // Text color in PDF
        cs.setNonStrokingColor(Color.BLUE);
        // set offset from where content starts in PDF
        cs.newLineAtOffset(20, 750);
        // required when using newLine() 
        cs.setLeading(12);
        cs.showText("First line added to the PDF");
        cs.newLine();
        String text = "This is a long text which spans multiple lines, this text checks if the line is changed as per the allotted width in the PDF or not."
                + "The Apache PDFBox library is an open source tool written in Java for working with PDF documents. ";
        divideTextIntoMultipleLines(text, 580, page, cs, PDType1Font.HELVETICA, 14);
        cs.endText();
      }
      // save and close PDF document
      pdDoc.save(DEST);
      pdDoc.close();
    } catch(IOException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
  }
    
  private static void divideTextIntoMultipleLines(String text, int allowedWidth, PDPage page, PDPageContentStream contentStream, PDFont font, int fontSize) throws IOException {
    List<String> lines = new ArrayList<String>();
    String line = "";
    // split the text one or more spaces
    String[] words = text.split("\\s+");
    for(String word : words) {
      if(!line.isEmpty()) {
        line += " ";
      }
      // check for width boundaries
      int size = (int) (fontSize * font.getStringWidth(line + word) / 1000);
      if(size > allowedWidth) {
        // if line + new word > page width, add the line to the list without the word
        lines.add(line);
        // start new line with the current word
        line = word;
      } else {
        // if line + word < page width, append the word to the line
        line += word;
      }
    }
    lines.add(line);
    // write lines to Content stream
    for(String ln : lines) { 
      contentStream.showText(ln);
      contentStream.newLine();
    }
  }
}
PDF in Java using PDFBox

Adding image to PDF using PDFBox – Java Program

import java.awt.Color;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;

public class PDFWithImage {
  public static final String DEST = "F:\\NETJS\\Test\\image.pdf";
  public static void main(String[] args) {
    try {
      PDDocument pdDoc = new PDDocument();
      PDPage page = new PDPage();
      // add page to the document
      pdDoc.addPage(page);
      // Create image object using the image location
      PDImageXObject image = PDImageXObject.createFromFile("F:\\NETJS\\netjs.png", pdDoc);
      // write to a page content stream
      try(PDPageContentStream cs = new PDPageContentStream(pdDoc, page)){
        cs.beginText();
        // setting font family and font size
        cs.setFont(PDType1Font.HELVETICA, 14);
        // Text color in PDF
        cs.setNonStrokingColor(Color.BLUE);
        // set offset from where content starts in PDF
        cs.newLineAtOffset(20, 750);
        cs.showText("Adding image to PDF Example");
        cs.newLine();
        cs.endText();
        cs.drawImage(image, 20, 450, 350, 200);
      }
      // save and close PDF document
      pdDoc.save(DEST);
      pdDoc.close();
    } catch(IOException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
  }
}

Creating Password protected PDF using PDFBox in Java

When you encrypt a PDF you can configure two things-

1. Access Permissions- Using PDFBox the access permissions to the user are configured using AccessPermission class. Following permissions can be given for an encrypted PDF document.

  • Print the document
  • modify the content of the document
  • copy or extract content of the document
  • add or modify annotations
  • fill in interactive form fields
  • extract text and graphics for accessibility to visually impaired people
  • assemble the document
  • print in degraded quality

2. Setting the password- For setting password protection, StandardProtectionPolicy class is used in PDFBox. You can pass AccessPermission class object, owner password and user password as arguments to the constructor of this class.

With owner password you can open the PDF without any restrictions on access.

With user password you can open the PDF with restricted access permissions.

In the example an existing PDF is loaded then access permission and password are set for that PDF.

import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.encryption.AccessPermission;
import org.apache.pdfbox.pdmodel.encryption.StandardProtectionPolicy;

public class PDFGenerator {
  public static final String PDF = "F:\\NETJS\\Test\\HelloWorld.pdf";
  final static String USER_PASSWORD = "user";
  final static String OWNER_PASSWORD = "owner";
  public static void main(String[] args) {
    try {
      //load an existing PDF
      PDDocument document = PDDocument.load(new File(PDF));
      AccessPermission ap = new AccessPermission();
      /** Setting access permissions */
      // Can't print PDF
      ap.setCanPrint(false);
      // Can't copy PDF
      ap.setCanExtractContent(false);
      /** Access permissions End */
      /** Setting password */
      StandardProtectionPolicy sp = new StandardProtectionPolicy(OWNER_PASSWORD, USER_PASSWORD, ap);
      sp.setEncryptionKeyLength(128);
      document.protect(sp);
      document.save(PDF);
      document.close();            
    } catch(IOException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
  }
}

Extract text from PDF using PDFBox in Java

You can use PDFTextStripper class to extract text from PDF using PDFBox.

import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;

public class PDFGenerator {
  public static final String PDF = "F:\\NETJS\\Test\\MultiLine.pdf";
  public static void main(String[] args) {
    try {
      //load an existing PDF
      PDDocument document = PDDocument.load(new File(PDF));
      PDFTextStripper textStripper = new PDFTextStripper();
      // Get count of total pages in the PDF
      int pageCount = document.getNumberOfPages();
      //set start page for extraction
      textStripper.setStartPage(1);
      // set last page for extraction
      textStripper.setEndPage(pageCount);
      // extract text from all the pages between start and end
      String text = textStripper.getText(document);
      System.out.println(text);
      document.close();            
    } catch(IOException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
  }
}

Output

First line added to the PDF
This is a long text which spans multiple lines, this text checks if the line is changed as per the 
allotted width in the PDF or not.The Apache PDFBox library is an open source tool written in 
Java for working with PDF documents.

Extract image from PDF using PDFBox in Java

You can use PDResources class to extract image from PDF using PDFBox.

Using PDResources class you can get all the resources available at page level. You can iterate over those resources to check if any of the resource is image, if yes then copy that image to the specified location.

import java.io.File;
import java.io.IOException;
import javax.imageio.ImageIO;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.common.PDStream;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;

public class PDFGenerator {
  public static final String PDF = "F:\\NETJS\\Test\\image.pdf";
  public static final String IMG_LOCATION = "F:\\NETJS\\Test";
  public static void main(String[] args) {
    try {
      //load an existing PDF
      PDDocument document = PDDocument.load(new File(PDF));
      // get resources for a page
      PDResources pdResources = document.getPage(0).getResources();
      int i = 0;
      for(COSName csName : pdResources.getXObjectNames()) {          
        PDXObject pdxObject = pdResources.getXObject(csName);   
        // if resource obj is of type image
        if(pdxObject instanceof PDImageXObject) {
          PDStream pdStream = pdxObject.getStream();
          PDImageXObject image = new PDImageXObject(pdStream, pdResources);
          i++;
          // image storage location and image name
          File imgFile = new File(IMG_LOCATION+"\\Pdfimage"+i+".png");
          ImageIO.write(image.getImage(), "png", imgFile);
        }
      }
      document.close();            
    } catch(IOException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
  }
}

That's all for this topic Creating PDF in Java Using Apache PDFBox. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Java Programs Page


Related Topics

  1. How to Create PDF From XML Using Apache FOP
  2. Spring MVC PDF Generation Example
  3. How to Create Password Protected Zip File in Java
  4. How to Read Excel File in Java Using Apache POI
  5. Find The Maximum Element in Each Row of a Matrix Java Program

You may also like-

  1. Tree Sort in Java Using Binary Search Tree
  2. How to Read Input From Console in Java
  3. Exponential Search Program in Java
  4. Linked List Implementation Java Program
  5. Heap Memory Allocation in Java
  6. Serialization and Deserialization in Java
  7. Lambda Expressions in Java 8
  8. Name Mangling in Python