While web scraping, it may be beneficial to gather page screenshots or examine what our headless browsers are viewing for debugging purposes. In Playwright, the screenshot() method of the page can be utilized to capture a screenshot. This approach is especially useful when ensuring the accuracy and effectiveness of our scraping activities. For those looking to enhance their web scraping projects, incorporating a powerful API for web scraping can provide the necessary tools for not only capturing screenshots with Playwright but also for navigating and extracting data from complex web pages. This comprehensive guide will provide you with easy steps and valuable insights into maximizing your web scraping efforts with Playwright, from setup to execution.
from pathlib import Path
from playwright.sync_api import sync_playwright
with sync_playwright() as pw:
browser = pw.chromium.launch(headless=False)
# To save cookies to a file first extract them from the browser context:
context = browser.new_context(viewport={"width": 1920, "height": 1080})
page = context.new_page()
page.goto('https://httpbin.dev/html')
image_bytes = page.screenshot(
full_page=True, # this will try to scroll to capture full page
path='screenshot.png', # this will save the screenshot directly to a file
clip={"x": 0, "y": 0, "width": 100, "height": 100}, # this will clip the screenshot to a specific region
)
# or we can save it manually
Path("screenshot.png").write_bytes(image_bytes)
# we can also take a screenshot of an element
element = page.locator('p')
image_bytes = element.screenshot(path='screenshot.png')
⚠ Be aware that when scraping dynamic web pages, screenshots might be taken before the page has fully loaded. For more information, see how to wait for a page to load in Playwright.