Got CAPTCHA'd on page 47. Every single time.
I thought I was being clever with my scraping setup. 10 requests per minute, rotating user agents, residential proxies. Page 47, CAPTCHA. Page 48, CAPTCHA. Page 49, CAPTCHA. Fun times. The site dec...

Source: DEV Community
I thought I was being clever with my scraping setup. 10 requests per minute, rotating user agents, residential proxies. Page 47, CAPTCHA. Page 48, CAPTCHA. Page 49, CAPTCHA. Fun times. The site decided I was a bot once I hit some threshold. Didn't matter that I was going slow. Didn't matter that I looked like a real browser. CAPTCHA wall, every time, starting at page 47. First thing I tried: more delays. 30 seconds between requests. Still got CAPTCHA'd on page 47. Interesting. Then I tried different proxy providers. Three of them. Same result, same page number. At this point I was convinced the threshold was tied to my IP somehow, but no, same thing happened with fresh IPs. Turns out the site just really, really doesn't like automated scraping. And I was too stubborn to give up. Ended up biting the bullet and paying for 2captcha. If you're not familiar, they solve CAPTCHAs using actual humans (or very good models, unclear). The integration looked like this: import time from twocaptcha