When testing our Puppeteer web scrapers, it might be beneficial to utilize local files instead of public websites. Puppeteer, much like actual web browsers, is capable of loading local files using the file:// URL protocol. This functionality is essential for developers looking to test their scraping scripts in a controlled environment without the need for internet access, thus speeding up development and debugging processes. In line with this, integrating a web crawling API can further enhance your testing framework. Such APIs provide additional capabilities for simulating web interactions and analyzing web content, enabling a comprehensive testing strategy that prepares your scraper for the complexities of the live web.
from playwright import sync_playwright
with sync_playwright() as pw:
browser = pw.chromium.launch(headless=False)
context = browser.new_context(viewport={"width": 1920, "height": 1080})
page = context.new_page()
# open a local file (note: absolute path needs to be used)
page.goto("file://home/user/projects/test.html"); # linux
page.goto("file://C:/Users/projects/test.html"); # windows
print(page.content())