Common Tips for Puppeteer Node.js Library

Puppeteer is a Node.js library that provides a high-level API for controlling headless Chrome or Chromium browsers. It's used for automating web page interactions and various testing scenarios.

To help you make the most of Puppeteer, we have compiled some common tips to keep in mind when working with this library.

1. Use Headless Mode

Puppeteer provides headless mode where the browser can operate without a user interface. This allows for faster load times and more efficient resource usage. When using Puppeteer for automated testing or web scraping, you should always use headless mode.

The example below shows how to launch a browser instance using Puppeteer with the headless option explicitly set to true.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();

  // Navigate to a webpage
  await page.goto('https://www.example.com');

  // Take a screenshot of the page
  await page.screenshot({ path: 'example.png' });

  // Get the page's title
  const title = await page.title();
  console.log(title);

  // Get the page's HTML
  const html = await page.content();
  console.log(html);

  await browser.close();
})();

2. Manage Page Resources carefully

Since Puppeteer provides a browser environment, it's easy to end up using too much memory or CPU resources. Carefully managing page resources can help avoid excessive usage. Avoid loading unnecessary images or scripts, and consider using the page.setRequestInterception function to selectively block certain resources.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Block certain resource types
  await page.setRequestInterception(true);
  page.on('request', (request) => {
    if (['image', 'stylesheet', 'font'].includes(request.resourceType())) {
      request.abort();
    } else {
      request.continue();
    }
  });

  // Navigate to a webpage
  await page.goto('https://www.example.com');

  // Take a screenshot of the page
  await page.screenshot({ path: 'example.png' });

  await browser.close();
})();

In this example, we use page.setRequestInterception() to enable request interception on the page. We then listen for incoming requests using the page.on('request') method. If a resource of type 'image', 'stylesheet', or 'font' is requested, we abort the request using the request.abort() method. Otherwise, we continue with the request using the request.continue() method.

After setting up the request interception, we navigate to a webpage and take a screenshot of it. In this example, we're blocking all image, stylesheet, and font resources, so the resulting screenshot will not include any images or styling. You can modify the logic inside the page.on('request') method to selectively block or allow other types of resources as needed.

3. Set a User Agent

Puppeteer allows you to set a user agent for your automated browser instance. This is useful for scenarios where you need to spoof a user agent or test how a site behaves when accessed from different browsers or platforms.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Set the user agent
  await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36');

  // Navigate to a webpage
  await page.goto('https://www.example.com');

  // Take a screenshot of the page
  await page.screenshot({ path: 'example.png' });

  await browser.close();
})();

In this example, we use the page.setUserAgent() method to set the user agent string to a custom value. This can be any valid user agent string, but in this example, we're setting it to a string representing Google Chrome version 58 on Windows 10.

After setting the user agent, we navigate to a webpage and take a screenshot of it. This screenshot will appear as if it were taken using Google Chrome 58 on Windows 10, even though we're using Puppeteer to automate the process.

4. Use Wait and Timeout Functions

Puppeteer provides a waitFor function that can be used to wait for specific elements or events on a page before proceeding with further actions. It's important to set a timeout value for any wait functions to prevent your scripts from hanging indefinitely.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Navigate to a webpage
  await page.goto('https://www.example.com');

    // Wait for the element to become visible, with a timeout of 10 seconds
  await page.waitFor('#example-element', { timeout: 10000 });

  // Take a screenshot of the page
  await page.screenshot({ path: 'example.png' });

  await browser.close();
})();

In this example, we use the page.waitFor() function to wait for an element with the ID example-element to be visible on the page. This function will pause the script execution until the element is found or a timeout of 10 seconds is reached.

After waiting for the element to be visible, we take a screenshot of the page. This ensures that the screenshot includes the element we were waiting for.

You can use the page.waitFor() function to wait for a variety of different conditions on a page, such as an element to be visible, an XHR request to complete, or a function to return a certain value. The function accepts a variety of options to customize the wait behavior, such as timeout duration and polling interval.

5. Handle Authentication

In some automated scenarios, you may need to authenticate with a website before you can access certain pages or information. Puppeteer provides functions for handling common authentication scenarios such as basic authentication or filling out login forms.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Navigate to the login page
  await page.goto('https://www.example.com/login');

  await page.waitForNavigation(),

  // Fill in the username and password fields
  await page.type('#username', 'myusername');
  await page.type('#password', 'mypassword');

  // Click the login button
  await Promise.all([
    page.waitForNavigation(),
    page.click('#login-button'),
  ]);

  // Take a screenshot of the logged-in page
  await page.screenshot({ path: 'example.png' });

  await browser.close();
})();

In this example, we first navigate to the login page using page.goto(). We then fill in the username and password fields using page.type(), which simulates typing in the fields.

Once the fields are filled in, we click the login button using page.click(). We also use Promise.all() to wait for the page to navigate to the next page after the login button is clicked.

Finally, we take a screenshot of the logged-in page using page.screenshot(). This screenshot will include any content that was only visible after the login process.

Conclusion

In conclusion, Puppeteer is a powerful Node.js library that can be used for a variety of purposes. By following these common tips, you can make your Puppeteer code more efficient, robust, and effective. Happy coding!