ScrapeNetwork

Intro to Python Requests Proxy: Comprehensive Guide for Web Scraping

Table of Contents

Table of Contents

Python’s requests package not only simplifies HTTP requests but also offers robust support for using proxies, including both HTTP and SOCKS5 types. This feature is essential for web scraping, as it allows developers to route their requests through different servers, effectively managing request rate limits and bypassing geo-restrictions or IP bans. By setting proxies for individual requests or configuring them for the entire script, developers can enhance the anonymity and efficiency of their web scraping operations. To further optimize your web scraping endeavors, considering the integration of the best web scraping API could significantly augment your capabilities. Such APIs are specifically designed to streamline the data extraction process, offering advanced features like automatic proxy rotation and sophisticated parsing capabilities that can handle even the most complex web pages. This combination of Python’s requests package and a high-quality web scraping API provides a powerful toolkit for developers looking to extract valuable data from the web with precision and speed.

import requests

# proxy pattern is:
# scheme://username:password@IP:PORT
# For example:
# no auth HTTP proxy:
my_proxy = "http://160.11.12.13:1020"
# or socks5
my_proxy = "socks://160.11.12.13:1020"
# proxy with authentication
my_proxy = "http://my_username:my_password@160.11.12.13:1020"
# note: that username and password should be url quoted if they contain URL sensitive characters like "@":
from urllib.parse import quote
my_proxy = f"http://{quote('foo@bar.com')}:{quote('password@123')}@160.11.12.13:1020"


proxies = {
    # this proxy will be applied to all http:// urls
    'http': 'http://160.11.12.13:1020',
    # this proxy will be applied to all https:// urls (not the S)
    'https': 'http://160.11.12.13:1020',
    # we can also use proxy only for specific pages
    'https://httpbin.dev': 'http://160.11.12.13:1020',
}
requests.get("https://httpbin.dev/ip", proxies=proxies)

Note that proxy can also be set through the standard *_PROXY environment variables:

$ export HTTP_PROXY="http://160.11.12.13:1020"
$ export HTTPS_PROXY="http://160.11.12.13:1020"
$ export ALL_PROXY="socks://160.11.12.13:1020"
$ python
import requests
# this will use the proxies we set
requests.get("https://httpbin.dev/ip")

Finally, when web scraping using proxies we should rotate proxies for each request. Check out our guide on rotating proxies for more information. For more on proxies, see our introduction to proxies in web scraping.

Related Questions

Related Blogs

httpx
Python’s HTTP responses can be viewed in a web browser by saving the contents to a temporary file and then opening it in the default...
requests
Installing the requests package in Python can be achieved in several ways, each tailored to suit different development environments and project needs. The most straightforward...
Python
The ConnectTimeout error often appears when using the Python requests module for web scraping with an explicit timeout parameter. This error signals that the request...