How to convert HTML to PDF using Node.js and Puppeteer

Are you looking for a way to convert your HTML files into PDF format? Using Puppeteer, a Node.js library, you can automate the process of converting your HTML files into PDFs.

In this article, we'll take a look at how you can use Puppeteer to convert HTML to PDF, and provide a step-by-step guide to help you get started.

What is Puppeteer?

Puppeteer is a Node.js library that provides a high-level API for controlling headless Chrome or Chromium over the DevTools Protocol.

With Puppeteer, you can automate Chrome or Chromium to perform tasks like taking screenshots, generating PDFs, and crawling pages.

Puppeteer provides a powerful set of features, and is widely used by developers and testers alike.

Getting Started

Before we dive into how to use Puppeteer to convert HTML to PDF, let's first take a look at how to get started.

To use it, you will need to have Node.js installed on your computer.

Create new Node.js Project

Start by creating a new folder for your project, and then open the folder in your terminal.

mkdir html-to-pdf-puppeteer
cd html-to-pdf-puppeteer

In the terminal, run the following command to create a new Node.js project:

npm init -y

This will create a new Node.js project in the current folder.

Install Puppeteer

To install Puppeteer, run the following command in your terminal:

npm install puppeteer

This will add Puppeteer to your project, adding the dependency to your package.json file.

Import Puppeteer

To use Puppeteer, you first need to import it into your project.

To import Puppeteer, add the following line to the top of your index.js file:

const puppeteer = require('puppeteer');

Launching a Headless Browser

To use Puppeteer, you first need to launch a headless browser instance. A headless browser is a browser that runs without a graphical user interface.

With Puppeteer, you can launch a headless browser instance using the puppeteer.launch() method.

Here's an example of how to launch a headless browser instance:

const puppeteer = require('puppeteer');
(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    // do something
    await browser.close();
})();

In this example, we first import Puppeteer, and then launch a headless browser instance using the puppeteer.launch() method.

We then create a new page using the browser.newPage() method, and then perform some operations (we will add them later).

Finally, we close the browser using the browser.close() method.

Converting a HTML web page to PDF with Puppeteer

Now that we have a basic understanding of how Puppeteer works, let's take a look at how to use it to convert HTML to PDF.

const puppeteer = require('puppeteer');
(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://www.docsfold.com', {waitUntil: 'networkidle0'});
    await page.emulateMediaType('screen');
    await page.pdf({
        path: 'example.pdf',
        margin: { top: '100px', right: '50px', bottom: '100px', left: '50px' },
        printBackground: true,
        format: 'A4',        
    });
    await browser.close();
})();

In this example, we first launch a headless browser instance, create a new page, and navigate to the URL https://www.docsfold.com.

Wait Until Page is Fully Loaded

The waitUntil option allows you to specify when the navigation should be considered complete.

By default, waitUntil is set to load, which means that Puppeteer waits for the load event to fire on the page before considering the navigation complete. However, you can set it to other values such as networkidle0 or networkidle2, which wait until there are no more than 0 or 2 network connections respectively.

networkidle0 means that Puppeteer will consider the navigation complete when there are no more than 0 network connections for at least 500 milliseconds. This can be useful for pages that load additional resources dynamically, such as images or scripts.

networkidle2 means that Puppeteer will consider the navigation complete when there are no more than 2 network connections for at least 500 milliseconds. This is useful for pages that need to load additional resources, such as Ajax requests, before they are fully rendered.

By setting waitUntil to the appropriate value, you can ensure that your Puppeteer script waits until the page is fully loaded before continuing with further interactions or PDF generation.

Emulate Media Type

The emulateMediaType method allows you to emulate the media type of the page. By default, Puppeteer will generate a PDF in print media type. However, you can change it to screen to emulate the media type of a screen. This way, the PDF will be generated in the same way as it would be displayed on a screen.

Configure PDF Options

We then use the page.pdf() method to generate a PDF of the page, and save it to a file named example.pdf.

In this example, we set the path option to example.pdf, which means that the PDF will be saved to a file named example.pdf.

We also set the margin option to { top: '100px', right: '50px', bottom: '100px', left: '50px' }, which means that the PDF will have a margin of 100px on the top and bottom, and 50px on the left and right.

We also set the printBackground option to true, which means that the PDF will include the background of the page. By default, the background is not included in the PDF.

Finally, we set the format option to A4, which means that the PDF will be generated in the A4 format.

Finally, we close the browser.

Run the following command in your terminal to run the script:

node index.js

This will generate a PDF of the page, and save it to a file named example.pdf.

Converting a HTML file to PDF with Puppeteer

To convert a local HTML file to PDF, you can use the page.setContent() method to set the content of the page to the HTML file.

First, let's create a new HTML file named example.html:

html

<!DOCTYPE html>
<html>
<head>
    <title>Example</title>
</head>
<body>
    <h1>Example</h1>
    <p>This is an example.</p>
</body>
</html>

Next, let's replace the code in the index.js file with the following:

const puppeteer = require('puppeteer');
const fs = require('fs');
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  const html = fs.readFileSync('example.html', 'utf-8');
  await page.setContent(html, { waitUntil: 'domcontentloaded' });
  await page.pdf({ 
      path: 'example.pdf',
      margin: { top: '100px', right: '50px', bottom: '100px', left: '50px' },
      printBackground: true,
      format: 'A4'
  });
  await browser.close();
})();

In this example, we first import the fs module, which allows us to read the contents of a file.

We then use the fs.readFileSync() method to read the contents of the index.html file, and store it in the html variable.

We then use the page.setContent() method to set the content of the page to the HTML file.

We then use the page.pdf() method to generate a PDF of the page, and save it to a file named example.pdf.

Finally, we close the browser.

Run the following command in your terminal to run the script:

node index.js

This will generate a PDF of the page, and save it to a file named example.pdf.

Alternative: Using the DocsFold PDF API

You can also use the DocsFold PDF API to convert HTML to PDF. The API is a REST API that allows you to generate PDF files from reusable HTML templates using a simple HTTP request.

For more details check out the DocsFold homepage and the DocsFold API documentation.