reCAPTCHA & hCaptcha: Bypassing During Scraping

Written by Team Froxy | Mar 17, 2026 6:59:59 AM

CAPTCHA (an abbreviation for Completely Automated Public Turing test to tell Computers and Humans Apart) is a set of small tests or checks used to distinguish a human from a bot. Technically, any type of test can eventually be bypassed with automated systems, but CAPTCHA is designed in such a way that using computers to solve it becomes unreasonably expensive and complex.

Humans, on the other hand, can usually pass these checks quickly and relatively easily. For example, when they need to select certain types of images or recognize distorted characters.

CAPTCHA has been, still is, and will likely remain for a long time an effective mechanism for protecting websites and online services from bots, scrapers, and other unwanted connections.

Below, we’ll explain what types of CAPTCHA exist and how to handle reCAPTCHA and hCaptcha during web scraping. These are the most popular protection services integrated into forms and pages on modern websites.

Types of CAPTCHA

To give you a better understanding of CAPTCHA-based protection mechanisms, let’s first look at the main types:

Classic text CAPTCHA is usually a distorted set of letters and numbers shown in an image that the user must recognize. With enough training data and pattern exposure, neural networks can already solve this kind of task.
Text-to-number / simple math challenges are a subtype of text CAPTCHA where large numbers are written out in words. The user reads the words and enters the number as digits. In some cases, the visitor may also be asked to solve a very simple math problem.
reCAPTCHA v2 is outdated but still widely used. It starts with the “I’m not a robot” checkbox, but if the algorithm does not accept the check, it asks the user to select images matching certain criteria.
reCAPTCHA v3 is the so-called “invisible CAPTCHA.” It works in the background, evaluates user behavior, and is almost unnoticeable to humans. The website owner receives a final score for the visitor and can configure more complex response logic. For example, they may block the visitor immediately or ask them to solve a visual CAPTCHA based on reCAPTCHA v2. In practice, however, this can be any other “human verification” mechanism.
Image selection is a mechanism familiar to many users: the system shows different image fragments and asks the user to select the ones matching a given feature. For example: select all images with cars or traffic lights.
Image rotation is a separate mechanic where the user uses rotation buttons to match the direction of certain objects. To make it harder, the objects may be three-dimensional.
Puzzle solving requires moving part of an image into the correct position using a slider or with a mouse/finger (on touchscreens).
Arranging symbols in order means the user must recognize the icons or symbols shown in an image and then click them in the order indicated in a separate sample.
Audio CAPTCHA asks the user to listen to letters or numbers in partially distorted audio and then enter them into a form.
Invisible input fields (traps / honeypots) work by embedding an additional input field in the HTML code that looks very similar to the real one but is hidden with a JavaScript script. A real user does not see it and therefore does not fill it in. But if a bot interacts with the form, it will be automatically blocked. This method may also include additional complications, such as random tokens for form identification, pre-filled values, and so on.
Behavioral analysis is now used by many websites that load large amounts of JavaScript code. Many scripts can track user actions: mouse movement, clicks on buttons and other layout elements, typing patterns in form fields, delays between scrolls, pauses between page transitions, and so on. If the behavior pattern does not resemble natural human actions, the connection is blocked.

There is an old saying: no matter how smart you think you are, someone will always be one step ahead. That’s why every protection mechanism can be bypassed with specialized technical solutions – any CAPTCHA can be skipped.

What is reCAPTCHA?

reCAPTCHA is one of the most popular cloud services that provides a comprehensive CAPTCHA implementation. It is currently owned and developed by Google.

The original version of the script was first introduced in 2007 at Carnegie Mellon University. In 2009, Google acquired the project and integrated it into its own services such as Blogger, YouTube, Gmail, and others. The re- prefix in the name is not accidental – it hints at reuse, because the text fragments users were asked to recognize were used to help train OCR systems, which in turn digitized books.

Any website can use the reCAPTCHA service by adding a special script to its pages or forms. However, it is worth remembering that Google charges projects that make more than 10,000 CAPTCHA requests per month.

The first version of reCAPTCHA was retired in 2018, while version 2 is still in use. Invisible reCAPTCHA v3 was introduced in 2017; it scores visitors on a scale from 0 to 1, where 0 means “definitely a bot” and 1 means “definitely a human.”

Thanks to reCAPTCHA, Google can track user behavior on websites that do not belong to the company. This has led to many debates and assumptions. But one fact remains: reCAPTCHA is the most widely used anti-bot protection system in the world.

How to bypass reCAPTCHA during web scraping

Since both reCAPTCHA v2 and v3 are currently in use, different approaches and technical solutions may be applied.

Below, we will try to outline the most effective methods for bypassing reCAPTCHA.

1. Using a headless browser

Both reCAPTCHA v2 and reCAPTCHA v3 work with JavaScript. If you connect to the target website without a full-featured browser, you will almost certainly be blocked almost immediately. The exception is when only specific pages or forms are protected by CAPTCHA.

Since Google analyzes a large amount of client-side data, you need to account for many details and headless browser settings:

Hiding traces of headless mode. For example, stealth plugins such as puppeteer-extra-plugin-stealth, playwright-extra stealth, and similar tools may help. Alternatively, you can try to hide these signals manually through proper launch settings and special flags.
Simulating a believable browser fingerprint. This includes many technical parameters: cookies, installed fonts, user-agent, screen resolution, WebGL, canvas fingerprint, GPU/CPU parameters, and so on.
Simulating real user behavior. This requires properly controlling delays between URL visits, moving the cursor, scrolling pages naturally, and similar actions.
Strict control over session duration and request frequency. Even if your virtual user behaves perfectly, Google may still request a CAPTCHA if the number of requests from a single IP is too high.

If you use a “clean” browser profile, reCAPTCHA v3 or Enterprise will usually trigger almost immediately. Bypassing reCAPTCHA with a properly configured browser is only possible on a small percentage of websites – mainly those that still use version 2, as well as sites where owners have intentionally lowered the sensitivity threshold of third-generation reCAPTCHA.

2. Solving reCAPTCHA manually

The logic is very simple: you start the scraper. If a CAPTCHA form appears on the page during execution, you pause the script and ask an operator to solve the CAPTCHA.

Once the CAPTCHA is solved, the data collection process continues.

However, for this to work, your script must be able to detect when a CAPTCHA window appears. Example Python code:

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options

class RecaptchaParser:
    def __init__(self, headless=True):
        self.driver = None
        self.headless = headless
        self.setup_driver()
    
    def setup_driver(self):
        """Chrome driver setup"""
        chrome_options = Options()
        
        if self.headless:
            chrome_options.add_argument("--headless=new")
        
        # Additional options for stability
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_argument("--disable-dev-shm-usage")
        chrome_options.add_argument("--disable-blink-features=AutomationControlled")
        chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
        chrome_options.add_experimental_option('useAutomationExtension', False)
        
        # Masking as real user
        chrome_options.add_argument(
            "user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
        )
        
        service = Service(ChromeDriverManager().install())
        self.driver = webdriver.Chrome(service=service, options=chrome_options)
        
        # Additional masking
        self.driver.execute_script(
            "Object.defineProperty(navigator, 'webdriver', {get: () => undefined})"
        )
    
    def check_recaptcha_v2(self):
        """Check for reCAPTCHA v2 appearance"""
        try:
            # Main reCAPTCHA v2 indicators
            recaptcha_indicators = [
                (By.CSS_SELECTOR, "iframe[src*='recaptcha']"),
                (By.CSS_SELECTOR, "div.g-recaptcha"),
                (By.CSS_SELECTOR, "iframe[title*='recaptcha']"),
                (By.CSS_SELECTOR, "div.recaptcha-checkbox-border"),
                (By.XPATH, "//div[contains(@class, 'recaptcha')]"),
            ]
            
            for selector_type, selector in recaptcha_indicators:
                try:
                    elements = self.driver.find_elements(selector_type, selector)
                    if elements:
                        return True
                except:
                    continue
            
            return False
        except Exception as e:
            print(f"Error checking reCAPTCHA v2: {e}")
            return False
    
    def check_recaptcha_v3(self):
        """Check for reCAPTCHA v3 presence"""
        try:
            # Check for reCAPTCHA v3 in page source
            scripts = self.driver.find_elements(By.TAG_NAME, "script")
            for script in scripts:
                script_content = script.get_attribute('src') or script.get_attribute('innerHTML') or ''
                if 'recaptcha' in script_content.lower() and 'api.js' in script_content:
                    # Additional check for v3
                    if 'render=explicit' not in script_content and 'render=onload' not in script_content:
                        return True
            return False
        except Exception as e:
            print(f"Error checking reCAPTCHA v3: {e}")
            return False
    
    def check_recaptcha_challenge(self):
        """Check for captcha challenge window appearance"""
        try:
            # Check for various captcha interface elements
            challenge_indicators = [
                (By.CSS_SELECTOR, "iframe[src*='bframe']"),  # Challenge frame
                (By.CSS_SELECTOR, "div.rc-imageselect-desc"),  # Challenge description
                (By.CSS_SELECTOR, "div.rc-imageselect-target"),  # Image selection area
                (By.XPATH, "//div[contains(text(), 'Select all images')]"),
                (By.XPATH, "//div[contains(text(), 'Please select all')]"),
            ]
            
            for selector_type, selector in challenge_indicators:
                try:
                    element = self.driver.find_element(selector_type, selector)
                    if element.is_displayed():
                        return True
                except:
                    continue
            
            return False
        except Exception as e:
            print(f"Error checking captcha challenge window: {e}")
            return False
    
    def wait_for_user_solution(self, timeout=300):
        """Wait for user to solve captcha"""
        print("\n" + "="*50)
        print("Captcha detected!")
        print("Please solve the captcha in the opened browser window.")
        print(f"You have {timeout//60} minutes.")
        print("="*50 + "\n")
        
        # Exit headless mode for captcha solving
        if self.headless:
            print("Switching to normal mode for captcha solving...")
            self.headless = False
            self.restart_driver_without_headless()
        
        start_time = time.time()
        while time.time() - start_time < timeout:
            # Check if captcha disappeared
            if not self.check_recaptcha_challenge() and not self.check_recaptcha_v2():
                print("Captcha successfully solved! Continuing work...")
                return True
            
            # Show remaining time every 30 seconds
            elapsed = int(time.time() - start_time)
            if elapsed % 30 == 0:
                remaining = timeout - elapsed
                print(f"Waiting for captcha solution... {remaining} seconds remaining")
            
            time.sleep(1)
        
        print("Timeout waiting for captcha solution!")
        return False
    
    def restart_driver_without_headless(self):
        """Restart driver without headless mode"""
        current_url = self.driver.current_url if self.driver else None
        self.close()
        
        chrome_options = Options()
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_argument("--disable-dev-shm-usage")
        chrome_options.add_argument("--disable-blink-features=AutomationControlled")
        
        service = Service(ChromeDriverManager().install())
        self.driver = webdriver.Chrome(service=service, options=chrome_options)
        
        if current_url:
            self.driver.get(current_url)
    
    def parse_with_recaptcha_handling(self, url, parse_function=None):
        """
        Main method for parsing with captcha handling
        
        :param url: URL to parse
        :param parse_function: function for data parsing
        """
        try:
            print(f"Loading page: {url}")
            self.driver.get(url)
            
            # Main parsing loop
            while True:
                # Check for captcha presence
                if self.check_recaptcha_v2() or self.check_recaptcha_v3():
                    print("Captcha detected on the page!")
                    
                    if self.check_recaptcha_challenge():
                        # Captcha is already active
                        if not self.wait_for_user_solution():
                            print("Failed to solve captcha")
                            return None
                    else:
                        # Captcha is present but challenge window hasn't appeared yet
                        print("Waiting for captcha challenge window to appear...")
                        time.sleep(2)
                        
                        # Try to activate captcha if needed
                        try:
                            recaptcha_iframe = self.driver.find_element(
                                By.CSS_SELECTOR, 
                                "iframe[src*='recaptcha'], iframe[title*='recaptcha']"
                            )
                            self.driver.switch_to.frame(recaptcha_iframe)
                            
                            # Click captcha checkbox
                            checkbox = self.driver.find_element(
                                By.CSS_SELECTOR, 
                                ".recaptcha-checkbox-border, #recaptcha-anchor"
                            )
                            checkbox.click()
                            
                            self.driver.switch_to.default_content()
                        except:
                            pass
                        
                        # Check if challenge window appeared
                        if self.check_recaptcha_challenge():
                            if not self.wait_for_user_solution():
                                print("Failed to solve captcha")
                                return None
                
                # If no captcha, perform parsing
                if parse_function:
                    result = parse_function(self.driver)
                    if result:
                        return result
                
                # Short pause before next check
                time.sleep(2)
                
        except Exception as e:
            print(f"Error during parsing process: {e}")
            return None
    
    def close(self):
        """Close browser"""
        if self.driver:
            self.driver.quit()


# Usage example
def example_parse_function(driver):
    """Example function for data parsing"""
    try:
        # Your parsing code here
        # For example, get page title
        title = driver.title
        print(f"Current page title: {title}")
        return title
    except Exception as e:
        print(f"Error during parsing: {e}")
        return None


def main():
    """Main function"""
    # Create parser instance
    parser = RecaptchaParser(headless=True)
    
    try:
        # URL for parsing (example)
        url = "https://www.google.com/recaptcha/api2/demo"  # Test page with captcha
        
        # Start parsing with captcha handling
        result = parser.parse_with_recaptcha_handling(url, example_parse_function)
        
        if result:
            print(f"Parsing result: {result}")
        else:
            print("Failed to get data")
            
    finally:
        # Always close the browser
        parser.close()


if __name__ == "__main__":
    # Required libraries installation:
    # pip install selenium webdriver-manager
    
    main()

3. Working through rotating residential or mobile proxies

If there are no obvious signs of a bot, the security system working alongside reCAPTCHA needs to collect more information about the user’s behavior and actions. This may take anywhere from 2 to 5 requests to pages on the target website.

By changing IP addresses, you effectively reset this counter, and the security system has to start tracking your activity history from scratch.

The only caveat is that the IP addresses must be as trustworthy as possible. This is exactly what rotating mobile and residential proxies are known for.

With high-quality proxies that rotate on every request, bypassing reCAPTCHA becomes much easier.

Please note: the frequent rotation approach is not suitable for every situation. For example, if you are logged into an account (personal cabinet, dashboard, etc.), it usually makes more sense to keep the session alive as long as possible.

Then, if a CAPTCHA eventually appears, you can either wait longer, solve the CAPTCHA manually, or change the proxy. However, this is best done together with the corresponding account and browser profile.

Mobile Proxies

Premium mobile IPs for ultimate flexibility and seamless connectivity.

Try With Trial

4. Using CAPTCHA-solving services

A little earlier, we showed code that helps detect a reCAPTCHA window. Instead of solving it manually, you can delegate the task to specialized services such as CapSolver, Anti-Captcha, 2Captcha, and others.

These services may rely on human labor, trained neural networks, pre-warmed headless browsers (sometimes entire AI-managed browser farms), or a combination of approaches depending on the CAPTCHA type and complexity.

If a site is protected by reCAPTCHA v3, you need to reach a certain score threshold. For this, only a farm of trusted browser instances – like the one used by CapSolver – may be suitable.

Examples of code for submitting CAPTCHA tasks and parsing responses are best taken from the documentation of the relevant services. Your script’s job is to detect the CAPTCHA challenge in time and extract the site key.

An example of what a sitekey may look like in the HTML of a page protected by reCAPTCHA:

<div class="g-recaptcha" data-sitekey="6LdKlZEpAAAARRRQjzC2v_d36tWxCl6dWsozdSy9"></div>

In general, all of the approaches above can be combined into one integrated system:

Your scraper runs on a headless browser.
It monitors browser profile quality and keeps script behavior natural.
If the scraping volume is small, CAPTCHA solving can be handled locally by an operator.
If there are too many CAPTCHA challenges, then it makes sense to use paid external services.

What is hCaptcha?

hCaptcha (human CAPTCHA) is one of the few high-quality alternatives to Google reCAPTCHA. Unlike Google, hCaptcha’s owners do not analyze user behavior to promote related products. On the contrary, the developer Intuition Machines focuses on privacy.

Like reCAPTCHA, hCaptcha offers several modes:

Visible CAPTCHA (classic) – checkbox + image challenges.
Invisible (passive) – almost nothing is visible except a small badge in the corner of the page. The analysis script runs in the background.
Frictionless – a fully invisible CAPTCHA, using only behavioral analytics plus AI-based risk scoring.
Enterprise – custom modes + API. A full automation stack that is useful for large clients and corporations.

One notable detail: website owners who use hCaptcha can earn additional revenue when users solve CAPTCHA challenges. Intuition Machines sells labeled data for AI training and shares part of the revenue with websites.

Migrating from reCAPTCHA is possible with just a few lines of code changed – usually it is enough to update the script URL and certain form fields.

How to Bypass hCaptcha During Web Scraping

The methods for bypassing hCaptcha are largely similar to those described for reCAPTCHA:

A headless browser with a natural-looking browser profile and realistic behavior.
Solving CAPTCHA manually or with the help of external services.
Working through high-quality rotating proxies with “human-like” IP addresses (mobile or residential/home connections).

The only unique part is the indicators that can be used to detect hCaptcha:

<div class="h-captcha" data-sitekey="...">
<script src="https://js.hcaptcha.com/1/api.js...">
An iframe with hcaptcha.com in the src
Network requests to the hcaptcha.com domain.

Conclusion

So, even if we look at other CAPTCHA alternatives such as Cloudflare Turnstile, Yandex SmartCaptcha, MTCaptcha, Puzzle CAPTCHA, FunCaptcha, and others, they use similar mechanics – almost all of them rely on JavaScript and background activity tracking. That is why working with them requires full-featured browsers, warmed-up profiles, and simulation of human behavior. If a CAPTCHA still appears, it must be solved either manually or through specialized paid services.

The only fast and relatively inexpensive way to deal with reCAPTCHA, hCaptcha, and similar systems is to use high-quality rotating proxies that allow you to change IP addresses on virtually every new request. This is exactly what our platform, Froxy, offers. We have over 10 million IPs in the pool, with flexible rotation and targeting settings tailored to the client’s needs, as well as residential and mobile proxies with traffic-based pricing.

View full post