Haproxy as A rotating proxy
Actually, I want to start writing blog about what I’ve faced in software development carrer for a long time.
Hoyah, finally I’m here. In this first post, I’m going to talk about how to setup haproxy as a rotating proxy.
Problem
Suppose we have crawler application that in general has below architecture.
Using multiple proxies is usual approach to mitigate scraping countermeasure of target
.
Application have to decide which proxy it gonna connect to. The problem is harder if you scale the application.
Rotating proxy
This image I took from the internet illustrates very well about rotating proxy.
If we put some kind of rotating proxy between application and our proxies, the problem becomes easier. Application should know only rotating proxy and let it do the heavy lifting such as choose underlying proxy/ip, load balancer.
Configuration
So, let’s go straight to the haproxy configuration.
Suppose we have 5 proxies and all proxies has the same username:password
.
Please have a look at the Proxy-Authorization header.
Below configuration has some notable lines:
reqadd Proxy-Authorization:\ Basic\ YWxhZGRpbjpvcGVuc2VzYW1l
: addProxy-Authorization
header to every request that’s sent to underlying proxy.balance roundrobin
: load balancing algorithm
global
log /dev/log local0 debug
chroot /var/lib/haproxy
stats socket ipv4@127.0.0.1:9999 level admin
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
maxconn 256
defaults
mode http
log global
option httplog
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
frontend rotatingproxies
bind *:3128
log /dev/log local0 debug
default_backend scraper
option http_proxy
option httplog
option http-use-proxy-header
option accept-invalid-http-request
backend scraper
mode http
server proxy1 server1:port1
server proxy2 server2:port2
server proxy3 server3:port3
server proxy4 server4:port4
server proxy5 server5:port5
reqadd Proxy-Authorization:\ Basic\ YWxhZGRpbjpvcGVuc2VzYW1l
balance roundrobin
This is basic configuration that makes haproxy acts as a rotating proxy. Haproxy has really rich features and numerous configuration parameters. I hope some days I can work more intensive with it.
Cheers!