When using the Python requests module to scrape pages with untrusted SSL certificates, you may encounter a SSLError
. This exception occurs when the SSL certificate of a website cannot be verified, which is a critical security measure to ensure data integrity and privacy. Encountering an SSLError can halt your web scraping projects, necessitating a reliable solution to bypass such security constraints without compromising on safety. To effectively address and mitigate these challenges, integrating a robust web scraping API into your scraping strategy could be immensely beneficial. These APIs are equipped to handle SSL verifications and other security protocols, offering a secure pathway to access and extract web data. By utilizing a comprehensive web scraping API, you can effortlessly navigate through SSL certificate verifications, enabling uninterrupted data collection and analysis.
import requests
response = requests.get("https://example.com/")
# raises:
# SSLError: HTTPConnectionPool(host='example.com', port=80)...
# we can disable certificate verification (note: this is risky as it disables end-to-end encryption)
response = requests.get("https://example.com/", verify=False)
# or specify the certificate file explicitly (.pem)
cert_location = "certificates/example-com-certificate.pem"
response = requests.get("https://example.com/", verify=cert_location)
Although the SSLError
exception is not commonly encountered in web scraping, the simplest solution is to disable certification verification (using the verify=False parameter) if no sensitive data is being exchanged.
If a manual fix is required, you can find the SSL certificates that requests
is using with the requests.certs.where()
method:
import requests
print(requests.certs.where())
'/etc/ssl/certs/ca-certificates.crt' # example on Linux
You can also override this value using the REQUESTS_CA_BUNDLE
environment variable:
$ export REQUESTS_CA_BUNDLE="/etc/ssl/certs/my-certificates.pem"
$ python -c "import requests;print(requests.certs.where())"
/ets/ssl/certs/my-certificates.pem
In conclusion, requests
uses certifi to manage all SSL certificate-related operations. If you’re having issues, consider updating it: pip install certifi --upgrade