Friday, February 19, 2021

Convert HTML to PDF in Java + Openhtmltopdf and PDFBox

In this tutorial we’ll see how to convert HTML to PDF in Java using Openhtmltopdf and PDFBox.

Check another option to convert HTMP to PDF in this post- HTML to PDF in Java + Flying Saucer and OpenPDF

How does it work

Let’s first understand what do the libraries mentioned here do-

  1. Open HTML to PDF is a pure-Java library for rendering arbitrary well-formed XML/XHTML (and even HTML5) using CSS 2.1 for layout and formatting, outputting to PDF or images.
  2. jsoup library is used for parsing HTML using the best of HTML5 DOM methods and CSS selectors. That gives you a well formed HTML (XHTML) that can be passed to the Openhtmltopdf.
  3. Openhtmltopdf uses the open-source PDFBOX as PDF library which generates PDF document from the rendered representation of the XHTML returned by Openhtmltopdf.

Maven Dependencies

To get the above mentioned libraries you need to add following dependencies to your pom.xml

<dependency>
  <groupId>com.openhtmltopdf</groupId>
  <artifactId>openhtmltopdf-core</artifactId>
  <version>1.0.6</version>
</dependency>
<!--supports PDF output with Apache PDF-BOX -->
<dependency>
  <groupId>com.openhtmltopdf</groupId>
  <artifactId>openhtmltopdf-pdfbox</artifactId>
  <version>1.0.6</version>
</dependency>
<dependency>
  <groupId>org.jsoup</groupId>
  <artifactId>jsoup</artifactId>
  <version>1.13.1</version>
</dependency>

Convert HTML to PDF Java example

In this Java program to convert HTML to PDF using Openhtmltopdf and PDFBox we’ll try to cover most of the scenarios that you may encounter i.e. image in HTML, external and inline styling, any external font.

Following is the HTML we’ll convert to PDF. As you can see it uses external CSS file, has an image, uses inline styling too.

Test.html

<html lang="en">
  <head>
    <title>HTML File</title>  
    <style type="text/css">
      body{background-color: #F5F5F5;}
    </style>
    <link href="../css/style.css" rel="stylesheet" >
  </head>
  <body>
    <h1>HTML to PDF Java Example</h1>
    <p>String Pool image</p>
    <img src="../images/Stringpool.png" width="300" height="220">
    <p style="color:#F80000; font-size:20px">This text is styled using Inline CSS</p>
    <p class="fontclass">This text uses the styling from font face font</p>
    <p class="styleclass">This text is styled using external CSS class</p>
  </body>
</html>

External CSS used (style.css)

@font-face {
  font-family: myFont;
  src: url("../fonts/PRISTINA.TTF");
}
.fontclass{
  font-family: myFont;
  font-size:20px;
}
.styleclass{
  font-family: "Times New Roman", Times, serif;
  font-size:30px;
  font-weight: normal;
  color: 6600CC;
}

Directory structure for it is as given below-

Convert HTML to PDF Java

That’s how the HTML looks like in browser-

HTML to PDF Java PDFBox

Now we’ll write Java program to convert this HTML to PDF.

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.FileSystems;
import org.jsoup.Jsoup;
import org.jsoup.helper.W3CDom;
import org.jsoup.nodes.Document;
import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;

public class HtmlToPdfExample {
  public static void main(String[] args) {
    try {
      // HTML file - Input
      File inputHTML = new File(HtmlToPdfExample.class.getClassLoader().getResource("template/Test.html").getFile());
      // Converted PDF file - Output
      String outputPdf = "F:\\NETJS\\Test.pdf";
      HtmlToPdfExample htmlToPdf = new HtmlToPdfExample();
      //create well formed HTML
      org.w3c.dom.Document doc = htmlToPdf.createWellFormedHtml(inputHTML);
      System.out.println("Starting conversion to PDF...");
      htmlToPdf.xhtmlToPdf(doc, outputPdf);
    } catch (IOException e) {
      System.out.println("Error while converting HTML to PDF " + e.getMessage());
      e.printStackTrace();
    }
  }
  
  // Creating well formed document
  private org.w3c.dom.Document createWellFormedHtml(File inputHTML) throws IOException {
    Document document = Jsoup.parse(inputHTML, "UTF-8");
    document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
    System.out.println("HTML parsing done...");
    return new W3CDom().fromJsoup(document);
  }
  
  private void xhtmlToPdf(org.w3c.dom.Document doc, String outputPdf) throws IOException {
    // base URI to resolve future resources 
    String baseUri = FileSystems.getDefault()
                .getPath("F:/", "Anshu/NetJs/Programs/", "src/main/resources/template")
                .toUri()
                .toString();
    OutputStream os = new FileOutputStream(outputPdf);
    PdfRendererBuilder builder = new PdfRendererBuilder();
    builder.withUri(outputPdf);
    builder.toStream(os);
    // add external font
    builder.useFont(new File(getClass().getClassLoader().getResource("fonts/PRISTINA.ttf").getFile()), "PRISTINA");
    builder.withW3cDocument(doc, baseUri);
    builder.run();
    System.out.println("PDF creation completed"); 
    os.close();
  }
}

You need to register additional fonts used in your document so they may be included with the PDF.

builder.useFont(new File(getClass().getClassLoader().getResource("fonts/PRISTINA.ttf").getFile()), "PRISTINA");

You also need to configure the base URI to resolve the path for resources like image, css.

Here is the generated PDF from the HTML passed as input.

html to pdf openhtmltopdf

That's all for this topic Convert HTML to PDF in Java + Openhtmltopdf and PDFBox. If you have any doubt or any suggestions to make please drop a comment. Thanks!

>>>Return to Java Programs Page


Related Topics

  1. How to Create PDF From XML Using Apache FOP
  2. Creating PDF in Java Using iText
  3. How to Create PDF in Java Using OpenPDF
  4. Creating PDF in Java Using Apache PDFBox
  5. Spring MVC PDF Generation Example

You may also like-

  1. How to Create Password Protected Zip File in Java
  2. Compress And Decompress File Using GZIP Format in Java
  3. How to Write Excel File in Java Using Apache POI
  4. Creating Temporary File in Java
  5. Serialization and Deserialization in Java
  6. Type Erasure in Java Generics
  7. Convert String to float in Python
  8. Angular + Spring Boot JWT Authentication Example

5 comments:

  1. Nice stuff, it was nice to see this article about HTML5. It was really appreciable. Thank you so much for sharing such an informative article about - HTML5 tutorial in hindi

    ReplyDelete
  2. Hi, Thanks for your effort. But I cant get Bangla font properly. All words breaks, Plz help ASAP

    ReplyDelete
    Replies
    1. In the article there is a line-
      You need to register additional fonts used in your document so they may be included with the PDF.

      builder.useFont(new File(getClass().getClassLoader().getResource("fonts/PRISTINA.ttf").getFile()), "PRISTINA");

      That's where you have to change font as per your requirement. Which font is needed for your requirement that you have to find out.

      Delete
  3. How can I configure the base URI for additional images and css?

    ReplyDelete
  4. If we have jquery script in html then how to convert that into text or styling and append it into pdf?

    ReplyDelete