CAPTCHA (an abbreviation for Completely Automated Public Turing test to tell Computers and Humans Apart) is a set of small tests or checks used to distinguish a human from a bot. Technically, any type of test can eventually be bypassed with automated systems, but CAPTCHA is designed in such a way that using computers to solve it becomes unreasonably expensive and complex.
Humans, on the other hand, can usually pass these checks quickly and relatively easily. For example, when they need to select certain types of images or recognize distorted characters.
CAPTCHA has been, still is, and will likely remain for a long time an effective mechanism for protecting websites and online services from bots, scrapers, and other unwanted connections.
Below, we’ll explain what types of CAPTCHA exist and how to handle reCAPTCHA and hCaptcha during web scraping. These are the most popular protection services integrated into forms and pages on modern websites.
To give you a better understanding of CAPTCHA-based protection mechanisms, let’s first look at the main types:
There is an old saying: no matter how smart you think you are, someone will always be one step ahead. That’s why every protection mechanism can be bypassed with specialized technical solutions – any CAPTCHA can be skipped.
reCAPTCHA is one of the most popular cloud services that provides a comprehensive CAPTCHA implementation. It is currently owned and developed by Google.
The original version of the script was first introduced in 2007 at Carnegie Mellon University. In 2009, Google acquired the project and integrated it into its own services such as Blogger, YouTube, Gmail, and others. The re- prefix in the name is not accidental – it hints at reuse, because the text fragments users were asked to recognize were used to help train OCR systems, which in turn digitized books.
Any website can use the reCAPTCHA service by adding a special script to its pages or forms. However, it is worth remembering that Google charges projects that make more than 10,000 CAPTCHA requests per month.
The first version of reCAPTCHA was retired in 2018, while version 2 is still in use. Invisible reCAPTCHA v3 was introduced in 2017; it scores visitors on a scale from 0 to 1, where 0 means “definitely a bot” and 1 means “definitely a human.”
Thanks to reCAPTCHA, Google can track user behavior on websites that do not belong to the company. This has led to many debates and assumptions. But one fact remains: reCAPTCHA is the most widely used anti-bot protection system in the world.
Since both reCAPTCHA v2 and v3 are currently in use, different approaches and technical solutions may be applied.
Below, we will try to outline the most effective methods for bypassing reCAPTCHA.
Both reCAPTCHA v2 and reCAPTCHA v3 work with JavaScript. If you connect to the target website without a full-featured browser, you will almost certainly be blocked almost immediately. The exception is when only specific pages or forms are protected by CAPTCHA.
Since Google analyzes a large amount of client-side data, you need to account for many details and headless browser settings:
If you use a “clean” browser profile, reCAPTCHA v3 or Enterprise will usually trigger almost immediately. Bypassing reCAPTCHA with a properly configured browser is only possible on a small percentage of websites – mainly those that still use version 2, as well as sites where owners have intentionally lowered the sensitivity threshold of third-generation reCAPTCHA.
The logic is very simple: you start the scraper. If a CAPTCHA form appears on the page during execution, you pause the script and ask an operator to solve the CAPTCHA.
Once the CAPTCHA is solved, the data collection process continues.
However, for this to work, your script must be able to detect when a CAPTCHA window appears. Example Python code:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
class RecaptchaParser:
def __init__(self, headless=True):
self.driver = None
self.headless = headless
self.setup_driver()
def setup_driver(self):
"""Chrome driver setup"""
chrome_options = Options()
if self.headless:
chrome_options.add_argument("--headless=new")
# Additional options for stability
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)
# Masking as real user
chrome_options.add_argument(
"user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
)
service = Service(ChromeDriverManager().install())
self.driver = webdriver.Chrome(service=service, options=chrome_options)
# Additional masking
self.driver.execute_script(
"Object.defineProperty(navigator, 'webdriver', {get: () => undefined})"
)
def check_recaptcha_v2(self):
"""Check for reCAPTCHA v2 appearance"""
try:
# Main reCAPTCHA v2 indicators
recaptcha_indicators = [
(By.CSS_SELECTOR, "iframe[src*='recaptcha']"),
(By.CSS_SELECTOR, "div.g-recaptcha"),
(By.CSS_SELECTOR, "iframe[title*='recaptcha']"),
(By.CSS_SELECTOR, "div.recaptcha-checkbox-border"),
(By.XPATH, "//div[contains(@class, 'recaptcha')]"),
]
for selector_type, selector in recaptcha_indicators:
try:
elements = self.driver.find_elements(selector_type, selector)
if elements:
return True
except:
continue
return False
except Exception as e:
print(f"Error checking reCAPTCHA v2: {e}")
return False
def check_recaptcha_v3(self):
"""Check for reCAPTCHA v3 presence"""
try:
# Check for reCAPTCHA v3 in page source
scripts = self.driver.find_elements(By.TAG_NAME, "script")
for script in scripts:
script_content = script.get_attribute('src') or script.get_attribute('innerHTML') or ''
if 'recaptcha' in script_content.lower() and 'api.js' in script_content:
# Additional check for v3
if 'render=explicit' not in script_content and 'render=onload' not in script_content:
return True
return False
except Exception as e:
print(f"Error checking reCAPTCHA v3: {e}")
return False
def check_recaptcha_challenge(self):
"""Check for captcha challenge window appearance"""
try:
# Check for various captcha interface elements
challenge_indicators = [
(By.CSS_SELECTOR, "iframe[src*='bframe']"), # Challenge frame
(By.CSS_SELECTOR, "div.rc-imageselect-desc"), # Challenge description
(By.CSS_SELECTOR, "div.rc-imageselect-target"), # Image selection area
(By.XPATH, "//div[contains(text(), 'Select all images')]"),
(By.XPATH, "//div[contains(text(), 'Please select all')]"),
]
for selector_type, selector in challenge_indicators:
try:
element = self.driver.find_element(selector_type, selector)
if element.is_displayed():
return True
except:
continue
return False
except Exception as e:
print(f"Error checking captcha challenge window: {e}")
return False
def wait_for_user_solution(self, timeout=300):
"""Wait for user to solve captcha"""
print("\n" + "="*50)
print("Captcha detected!")
print("Please solve the captcha in the opened browser window.")
print(f"You have {timeout//60} minutes.")
print("="*50 + "\n")
# Exit headless mode for captcha solving
if self.headless:
print("Switching to normal mode for captcha solving...")
self.headless = False
self.restart_driver_without_headless()
start_time = time.time()
while time.time() - start_time < timeout:
# Check if captcha disappeared
if not self.check_recaptcha_challenge() and not self.check_recaptcha_v2():
print("Captcha successfully solved! Continuing work...")
return True
# Show remaining time every 30 seconds
elapsed = int(time.time() - start_time)
if elapsed % 30 == 0:
remaining = timeout - elapsed
print(f"Waiting for captcha solution... {remaining} seconds remaining")
time.sleep(1)
print("Timeout waiting for captcha solution!")
return False
def restart_driver_without_headless(self):
"""Restart driver without headless mode"""
current_url = self.driver.current_url if self.driver else None
self.close()
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
service = Service(ChromeDriverManager().install())
self.driver = webdriver.Chrome(service=service, options=chrome_options)
if current_url:
self.driver.get(current_url)
def parse_with_recaptcha_handling(self, url, parse_function=None):
"""
Main method for parsing with captcha handling
:param url: URL to parse
:param parse_function: function for data parsing
"""
try:
print(f"Loading page: {url}")
self.driver.get(url)
# Main parsing loop
while True:
# Check for captcha presence
if self.check_recaptcha_v2() or self.check_recaptcha_v3():
print("Captcha detected on the page!")
if self.check_recaptcha_challenge():
# Captcha is already active
if not self.wait_for_user_solution():
print("Failed to solve captcha")
return None
else:
# Captcha is present but challenge window hasn't appeared yet
print("Waiting for captcha challenge window to appear...")
time.sleep(2)
# Try to activate captcha if needed
try:
recaptcha_iframe = self.driver.find_element(
By.CSS_SELECTOR,
"iframe[src*='recaptcha'], iframe[title*='recaptcha']"
)
self.driver.switch_to.frame(recaptcha_iframe)
# Click captcha checkbox
checkbox = self.driver.find_element(
By.CSS_SELECTOR,
".recaptcha-checkbox-border, #recaptcha-anchor"
)
checkbox.click()
self.driver.switch_to.default_content()
except:
pass
# Check if challenge window appeared
if self.check_recaptcha_challenge():
if not self.wait_for_user_solution():
print("Failed to solve captcha")
return None
# If no captcha, perform parsing
if parse_function:
result = parse_function(self.driver)
if result:
return result
# Short pause before next check
time.sleep(2)
except Exception as e:
print(f"Error during parsing process: {e}")
return None
def close(self):
"""Close browser"""
if self.driver:
self.driver.quit()
# Usage example
def example_parse_function(driver):
"""Example function for data parsing"""
try:
# Your parsing code here
# For example, get page title
title = driver.title
print(f"Current page title: {title}")
return title
except Exception as e:
print(f"Error during parsing: {e}")
return None
def main():
"""Main function"""
# Create parser instance
parser = RecaptchaParser(headless=True)
try:
# URL for parsing (example)
url = "https://www.google.com/recaptcha/api2/demo" # Test page with captcha
# Start parsing with captcha handling
result = parser.parse_with_recaptcha_handling(url, example_parse_function)
if result:
print(f"Parsing result: {result}")
else:
print("Failed to get data")
finally:
# Always close the browser
parser.close()
if __name__ == "__main__":
# Required libraries installation:
# pip install selenium webdriver-manager
main()
If there are no obvious signs of a bot, the security system working alongside reCAPTCHA needs to collect more information about the user’s behavior and actions. This may take anywhere from 2 to 5 requests to pages on the target website.
By changing IP addresses, you effectively reset this counter, and the security system has to start tracking your activity history from scratch.
The only caveat is that the IP addresses must be as trustworthy as possible. This is exactly what rotating mobile and residential proxies are known for.
With high-quality proxies that rotate on every request, bypassing reCAPTCHA becomes much easier.
Please note: the frequent rotation approach is not suitable for every situation. For example, if you are logged into an account (personal cabinet, dashboard, etc.), it usually makes more sense to keep the session alive as long as possible.
Then, if a CAPTCHA eventually appears, you can either wait longer, solve the CAPTCHA manually, or change the proxy. However, this is best done together with the corresponding account and browser profile.
Premium mobile IPs for ultimate flexibility and seamless connectivity.
A little earlier, we showed code that helps detect a reCAPTCHA window. Instead of solving it manually, you can delegate the task to specialized services such as CapSolver, Anti-Captcha, 2Captcha, and others.
These services may rely on human labor, trained neural networks, pre-warmed headless browsers (sometimes entire AI-managed browser farms), or a combination of approaches depending on the CAPTCHA type and complexity.
If a site is protected by reCAPTCHA v3, you need to reach a certain score threshold. For this, only a farm of trusted browser instances – like the one used by CapSolver – may be suitable.
Examples of code for submitting CAPTCHA tasks and parsing responses are best taken from the documentation of the relevant services. Your script’s job is to detect the CAPTCHA challenge in time and extract the site key.
An example of what a sitekey may look like in the HTML of a page protected by reCAPTCHA:
<div class="g-recaptcha" data-sitekey="6LdKlZEpAAAARRRQjzC2v_d36tWxCl6dWsozdSy9"></div>
In general, all of the approaches above can be combined into one integrated system:
hCaptcha (human CAPTCHA) is one of the few high-quality alternatives to Google reCAPTCHA. Unlike Google, hCaptcha’s owners do not analyze user behavior to promote related products. On the contrary, the developer Intuition Machines focuses on privacy.
Like reCAPTCHA, hCaptcha offers several modes:
One notable detail: website owners who use hCaptcha can earn additional revenue when users solve CAPTCHA challenges. Intuition Machines sells labeled data for AI training and shares part of the revenue with websites.
Migrating from reCAPTCHA is possible with just a few lines of code changed – usually it is enough to update the script URL and certain form fields.
The methods for bypassing hCaptcha are largely similar to those described for reCAPTCHA:
The only unique part is the indicators that can be used to detect hCaptcha:
<div class="h-captcha" data-sitekey="..."><script src="https://js.hcaptcha.com/1/api.js...">So, even if we look at other CAPTCHA alternatives such as Cloudflare Turnstile, Yandex SmartCaptcha, MTCaptcha, Puzzle CAPTCHA, FunCaptcha, and others, they use similar mechanics – almost all of them rely on JavaScript and background activity tracking. That is why working with them requires full-featured browsers, warmed-up profiles, and simulation of human behavior. If a CAPTCHA still appears, it must be solved either manually or through specialized paid services.
The only fast and relatively inexpensive way to deal with reCAPTCHA, hCaptcha, and similar systems is to use high-quality rotating proxies that allow you to change IP addresses on virtually every new request. This is exactly what our platform, Froxy, offers. We have over 10 million IPs in the pool, with flexible rotation and targeting settings tailored to the client’s needs, as well as residential and mobile proxies with traffic-based pricing.