Puppeteer stealth is a widely used extension for the Puppeteer browser automation framework. This plugin modifies Puppeteer’s runtime to reduce the likelihood of detection by anti-scraping techniques, allowing for smoother data collection processes. Given the complexities and challenges of web data extraction, leveraging a web scraping API becomes essential for enhancing the capabilities of tools like Puppeteer Stealth. It ensures not just the efficiency of scraping activities but also their scalability and reliability across various web platforms.
By using puppeteer-stealth, scrapers can more effectively bypass Cloudflare, Datadome, and other prevalent anti-scraping services.
puppeteer-stealth can be installed using NPM:
$ npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
# or
$ yarn add puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
Then, the StealthPlugin
object needs to be attached to enable the extension:
// Note: import puppeteer-extra rather than puppeteer
const puppeteer = require('puppeteer-extra')
// add stealth plugin and use defaults (all evasion techniques)
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
puppeteer.use(StealthPlugin())
// test run - check bankstatementpdfconverter.com browser fingerprint page
puppeteer.launch({ headless: true }).then(async browser => {
console.log('Running tests..')
const page = await browser.newPage()
await page.goto('https://bankstatementpdfconverter.com/web-scraping-tools/browser-fingerprint')
await page.waitForTimeout(5000)
await page.screenshot({ path: 'testresult.png', fullPage: true })
await browser.close()
console.log(`All done, check the screenshot. ✨`)
})
Note that puppeteer-stealth
includes numerous patches for different detection techniques that can be customized and extended.
Alternatively, the Scrape Network API automatically bypasses anti-scraping protections using the anti-scraping protection bypass feature.