Are you looking for a way to convert your HTML files into PDF format? Using Puppeteer, a Node.js library, you can automate the process of converting your HTML files into PDFs.
In this article, we'll take a look at how you can use Puppeteer to convert HTML to PDF, and provide a step-by-step guide to help you get started.
What is Puppeteer?
Puppeteer is a Node.js library that provides a high-level API for controlling headless Chrome or Chromium over the DevTools Protocol.
With Puppeteer, you can automate Chrome or Chromium to perform tasks like taking screenshots, generating PDFs, and crawling pages.
Puppeteer provides a powerful set of features, and is widely used by developers and testers alike.
Getting Started
Before we dive into how to use Puppeteer to convert HTML to PDF, let's first take a look at how to get started.
To use it, you will need to have Node.js installed on your computer.
Create new Node.js Project
Start by creating a new folder for your project, and then open the folder in your terminal.
mkdir html-to-pdf-puppeteercd html-to-pdf-puppeteer
In the terminal, run the following command to create a new Node.js project:
npm init -y
This will create a new Node.js project in the current folder.
Install Puppeteer
To install Puppeteer, run the following command in your terminal:
npm install puppeteer
This will add Puppeteer to your project, adding the dependency to your package.json
file.
Import Puppeteer
To use Puppeteer, you first need to import it into your project.
To import Puppeteer, add the following line to the top of your index.js
file:
const puppeteer = require('puppeteer');
Launching a Headless Browser
To use Puppeteer, you first need to launch a headless browser instance. A headless browser is a browser that runs without a graphical user interface.
With Puppeteer, you can launch a headless browser instance using the puppeteer.launch()
method.
Here's an example of how to launch a headless browser instance:
const puppeteer = require('puppeteer');(async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); // do something await browser.close();})();
In this example, we first import Puppeteer, and then launch a headless browser instance using the puppeteer.launch()
method.
We then create a new page using the browser.newPage()
method, and then perform some operations (we will add them later).
Finally, we close the browser using the browser.close()
method.
Converting a HTML web page to PDF with Puppeteer
Now that we have a basic understanding of how Puppeteer works, let's take a look at how to use it to convert HTML to PDF.
const puppeteer = require('puppeteer');(async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://www.docsfold.com', {waitUntil: 'networkidle0'}); await page.emulateMediaType('screen'); await page.pdf({ path: 'example.pdf', margin: { top: '100px', right: '50px', bottom: '100px', left: '50px' }, printBackground: true, format: 'A4', }); await browser.close();})();
In this example, we first launch a headless browser instance, create a new page, and navigate to the URL https://www.docsfold.com
.
Wait Until Page is Fully Loaded
The waitUntil
option allows you to specify when the navigation should be considered complete.
By default, waitUntil is set to load
, which means that Puppeteer waits for the load event to fire on the page before considering the navigation complete. However, you can set it to other values such as networkidle0
or networkidle2
, which wait until there are no more than 0 or 2 network connections respectively.
networkidle0
means that Puppeteer will consider the navigation complete when there are no more than 0 network connections for at least 500 milliseconds. This can be useful for pages that load additional resources dynamically, such as images or scripts.
networkidle2
means that Puppeteer will consider the navigation complete when there are no more than 2 network connections for at least 500 milliseconds. This is useful for pages that need to load additional resources, such as Ajax requests, before they are fully rendered.
By setting waitUntil
to the appropriate value, you can ensure that your Puppeteer script waits until the page is fully loaded before continuing with further interactions or PDF generation.
Emulate Media Type
The emulateMediaType
method allows you to emulate the media type of the page. By default, Puppeteer will generate a PDF in print media type. However, you can change it to screen
to emulate the media type of a screen. This way, the PDF will be generated in the same way as it would be displayed on a screen.
Configure PDF Options
We then use the page.pdf()
method to generate a PDF of the page, and save it to a file named example.pdf
.
In this example, we set the path
option to example.pdf
, which means that the PDF will be saved to a file named example.pdf
.
We also set the margin
option to { top: '100px', right: '50px', bottom: '100px', left: '50px' }
, which means that the PDF will have a margin of 100px on the top and bottom, and 50px on the left and right.
We also set the printBackground
option to true
, which means that the PDF will include the background of the page. By default, the background is not included in the PDF.
Finally, we set the format
option to A4
, which means that the PDF will be generated in the A4 format.
Finally, we close the browser.
Run the following command in your terminal to run the script:
node index.js
This will generate a PDF of the page, and save it to a file named example.pdf
.
Converting a HTML file to PDF with Puppeteer
To convert a local HTML file to PDF, you can use the page.setContent()
method to set the content of the page to the HTML file.
First, let's create a new HTML file named example.html
:
<!DOCTYPE html><html><head> <title>Example</title></head><body> <h1>Example</h1> <p>This is an example.</p></body></html>
Next, let's replace the code in the index.js
file with the following:
const puppeteer = require('puppeteer');const fs = require('fs');(async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); const html = fs.readFileSync('example.html', 'utf-8'); await page.setContent(html, { waitUntil: 'domcontentloaded' }); await page.pdf({ path: 'example.pdf', margin: { top: '100px', right: '50px', bottom: '100px', left: '50px' }, printBackground: true, format: 'A4' }); await browser.close();})();
In this example, we first import the fs
module, which allows us to read the contents of a file.
We then use the fs.readFileSync()
method to read the contents of the index.html
file, and store it in the html
variable.
We then use the page.setContent()
method to set the content of the page to the HTML file.
We then use the page.pdf()
method to generate a PDF of the page, and save it to a file named example.pdf
.
Finally, we close the browser.
Run the following command in your terminal to run the script:
node index.js
This will generate a PDF of the page, and save it to a file named example.pdf
.
Alternative: Using the DocsFold PDF API
You can also use the DocsFold PDF API to convert HTML to PDF. The API is a REST API that allows you to generate PDF files from reusable HTML templates using a simple HTTP request.
For more details check out the DocsFold homepage and the DocsFold API documentation.