Haproxy as A rotating proxy
Actually, I want to start writing blog about what I’ve faced in software development carrer for a long time.
Hoyah, finally I’m here. In this first post, I’m going to talk about how to setup haproxy as a rotating proxy.
Suppose we have crawler application that in general has below architecture.
Using multiple proxies is usual approach to mitigate scraping countermeasure of
Application have to decide which proxy it gonna connect to. The problem is harder if you scale the application.
This image I took from the internet illustrates very well about rotating proxy.
If we put some kind of rotating proxy between application and our proxies, the problem becomes easier. Application should know only rotating proxy and let it do the heavy lifting such as choose underlying proxy/ip, load balancer.
So, let’s go straight to the haproxy configuration.
Suppose we have 5 proxies and all proxies has the same
Please have a look at the Proxy-Authorization header.
Below configuration has some notable lines:
reqadd Proxy-Authorization:\ Basic\ YWxhZGRpbjpvcGVuc2VzYW1l: add
Proxy-Authorizationheader to every request that’s sent to underlying proxy.
balance roundrobin: load balancing algorithm
global log /dev/log local0 debug chroot /var/lib/haproxy stats socket email@example.com:9999 level admin stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners stats timeout 30s user haproxy group haproxy daemon maxconn 256 defaults mode http log global option httplog timeout connect 5000ms timeout client 50000ms timeout server 50000ms frontend rotatingproxies bind *:3128 log /dev/log local0 debug default_backend scraper option http_proxy option httplog option http-use-proxy-header option accept-invalid-http-request backend scraper mode http server proxy1 server1:port1 server proxy2 server2:port2 server proxy3 server3:port3 server proxy4 server4:port4 server proxy5 server5:port5 reqadd Proxy-Authorization:\ Basic\ YWxhZGRpbjpvcGVuc2VzYW1l balance roundrobin
This is basic configuration that makes haproxy acts as a rotating proxy. Haproxy has really rich features and numerous configuration parameters. I hope some days I can work more intensive with it.