Haproxy as A rotating proxy

Sun, Sep 6, 2020 2-minute read

Actually, I want to start writing blog about what I’ve faced in software development carrer for a long time.
Hoyah, finally I’m here. In this first post, I’m going to talk about how to setup haproxy as a rotating proxy.

Problem

Suppose we have crawler application that in general has below architecture.

architecture

Using multiple proxies is usual approach to mitigate scraping countermeasure of target. Application have to decide which proxy it gonna connect to. The problem is harder if you scale the application.

architecture

Rotating proxy

This image I took from the internet illustrates very well about rotating proxy. rotating_proxy

If we put some kind of rotating proxy between application and our proxies, the problem becomes easier. Application should know only rotating proxy and let it do the heavy lifting such as choose underlying proxy/ip, load balancer.

architecture_with_rotating_proxy

Configuration

So, let’s go straight to the haproxy configuration.
Suppose we have 5 proxies and all proxies has the same username:password. Please have a look at the Proxy-Authorization header.
Below configuration has some notable lines:

  • reqadd Proxy-Authorization:\ Basic\ YWxhZGRpbjpvcGVuc2VzYW1l: add Proxy-Authorization header to every request that’s sent to underlying proxy.
  • balance roundrobin: load balancing algorithm
global
    log /dev/log    local0 debug
    chroot /var/lib/haproxy
    stats socket ipv4@127.0.0.1:9999 level admin
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s
    user haproxy
    group haproxy
    daemon
    maxconn 256

defaults
    mode http
    log global
    option  httplog
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms

frontend rotatingproxies
    bind *:3128
    log /dev/log local0 debug
    default_backend scraper
    option http_proxy
    option httplog
    option http-use-proxy-header
    option accept-invalid-http-request

backend scraper
    mode http
    server proxy1 server1:port1
    server proxy2 server2:port2
    server proxy3 server3:port3
    server proxy4 server4:port4
    server proxy5 server5:port5
    reqadd Proxy-Authorization:\ Basic\ YWxhZGRpbjpvcGVuc2VzYW1l
    balance roundrobin

This is basic configuration that makes haproxy acts as a rotating proxy. Haproxy has really rich features and numerous configuration parameters. I hope some days I can work more intensive with it.
Cheers! haproxy_logo