Got a CSRF Error While Scraping? Here Is How to Fix It

Written by Team Froxy | Dec 17, 2025 7:00:00 AM

CSRF is an abbreviation that refers to a specific type of attack on websites that involves intercepting authorization data and using it for malicious purposes. In this scenario, the victim doesn’t actually perform any actions, while the target website believes it is interacting with a real user who previously logged into their account (this information is stored in the browser, specifically in cookies).

Accordingly, a CSRF error does not appear out of nowhere. It is usually associated with filling out forms or sending direct HTTP requests to the server that can modify something in the databases. If you have written your own parsing script (web scraper) and encountered a CSRF token error, this article is for you.

Below, we’ll explain what a CSRF error is, why and in which situations it occurs, and how to avoid it when building parsers.

CSRF Token Basics: How Cross Site Forgery Prevention Works

The attack mechanism has been known for a long time, since the 1990s. Put very simply, CSRF proxy (short for Cross-Site Request Forgery) comes down to the fact that an attacker uses existing authorization data on the target site, stored in the browser, to send their own requests.

The moment of compromise usually occurs when the victim visits a third-party site controlled by the attacker. A special script is executed there, which uses the browser cookies and forges (redirects) the request.

Protection against this type of attack has also long been established - it involves the use of special CSRF tokens that are tied to user sessions (sessions are stored on the server, not in the browser) or even generated for each new request/action of the user. If a valid token is not included in the request to the server (for POST, PUT, DELETE methods, or when submitting forms on pages), the server will return a CSRF error.

Another protection mechanism is analyzing the HTTP_REFERER field to determine that the request is being sent by an intermediary rather than directly.

What Is a CSRF Error and Why Scrapers Trigger It

At this point, it’s already becoming clear how a parser can run into CSRF errors. This situation may occur in the following cases:

when the script tries to log in on the site (sign in to the user’s account) or perform actions on behalf of an already authenticated account;
when the site is accessed via direct POST requests (or, less commonly in parsing, via PUT or DELETE), for example, when configuring catalog/cart filtering parameters;
when API endpoints are used that are protected by web middleware (in cases where you are trying to access a site’s or web service’s internal API);
when the parser does not send the Referer/Origin header that the target site expects to see;
when the parser does not handle cookies (while the site may require them for interaction);
when the parser attempts to fill out and submit a form that is protected by a CSRF token.

Such situations are quite rare, but the developer should be ready to face the challenge.

How to Detect and Debug CSRF Errors in Your Scraper

The main problem with errors related to cross-site request forgery is that target websites often don’t return error messages that clearly indicate the absence of a CSRF token. Implementing cookie and CSRF support for every site indiscriminately is also unreasonable - in 90% of cases it will be unnecessary.

So, how should you detect CSRF errors?

Explicit CSRF Mention in the Error Page

There is no dedicated HTTP status code for CSRF. Therefore, the server typically uses standard response codes: 403 (forbidden), 419 (returned when sessions expire, including when a CSRF token is missing), 400 (bad/invalid request).

Less frequently, a direct indication can be found in the error text itself. It makes sense to search for keywords like “token”, “CSRF”, “invalid”.

In some cases, the server may return empty responses, effectively “ignoring” the parser’s requests.

None of these options can give you a 100% guarantee that you are facing a block due to a missing CSRF token. The only exception is a direct reference to CSRF in the error message (if you see that, consider yourself lucky).

More about how advanced protection systems (WAF) can behave.

Diagnosing Page Behavior by Comparing It to a “Reference”

As a “reference sample,” you should use how the server behaves and responds when working through a regular browser. That is, you need to open the target page manually in your real browser, with all its settings, cookies and other digital fingerprint attributes.

Once you’ve confirmed that the site works “as it should,” you can access the page or API from your parser and compare the resulting response.

If the server returns an error or doesn’t return anything at all, there’s a high chance you haven’t correctly emulated the behavior of a real client (including a possible rejection due to a missing CSRF token).

At this point, you need to go back to the real browser and carefully examine the data exchange process with the server. Most often, this involves digging into the analysis of POST requests (Developer Tools / DevTools → Network tab).

What exactly you should pay attention to:

whether cookies are present and which ones (inspect their contents: sessionid, PHPSESSID, laravel_session; to see the site’s cookies go to DevTools → Application → Storage → Cookies → select the domain → there you’ll see the full list of cookies for that site; the token may be stored directly in cookies – csrftoken, _csrf, XSRF-TOKEN etc.);
whether there is an X-CSRF-Token in the headers (the data may be passed as a special HTTP header, for example X-CSRF-Token or X-XSRF-TOKEN), a hidden input (in this case the token is embedded at the HTML level, for example: <input type="hidden" name="csrf_token" value="XXXXXX">) or a special meta tag (<meta name="csrf-token" content="XXXXXX">);
Referer / Origin (for the server to accept the request, it’s important to specify the correct source; individual internal URLs can be used as the referrer, for example, the form URL: Referer: https://example.com/form; to inspect HTTP headers use: DevTools → Network → specific POST request → Headers tab);
Content-Type (modern web applications may communicate internally/between components via API in a specific format - XML, JSON, etc.; CSRF tokens may be hidden in this data, so you should look for relevant values here: DevTools → Network → Headers → Request Headers → Content-Type);
the order of requests (if your request sequence differs from the standard one, connections may be marked as suspicious and blocked; normal behavior for a browser or web client looks like this: first, a GET request to the page is made, the response includes various parameters, including the current CSRF token; only then can the client send a POST request to the server. Direct POST requests fall out of this pattern by default).

Core Strategies to Avoid CSRF Errors When Scraping

To eliminate or at least minimize the risk of scraping errors related to CSRF tokens, we suggest the following approaches.

1. Maintaining Sessions During Parsing

This approach is based on the following principles:

CSRF is almost always tied to session cookies, so it’s important to preserve all session data for a single chain of requests sent to the same target site. This is especially critical for projects where you work through account/login authorization.
If CSRF token protection is in place, it’s also advisable to keep the same proxy - forced rotation of outgoing IP addresses should be minimal (only when absolutely necessary).
POST requests should never be sent “cold” - only with cookies preserved.

In other words, if you have POST requests, your parser cannot access pages directly without first opening a session and obtaining cookies.

Don’t forget: tokens, like sessions, can have their own lifetime. The actual validity period can only be determined experimentally. The most common range is about 10 - 30 minutes, but this can hardly be called a “golden standard.” Each site may have its own timing configuration - up to binding a CSRF token to every new request (user action).

2. Writing a Dedicated Module to Detect CSRF Tokens and Log Parser Actions for Analysis

You should account for all the common ways a CSRF token can be passed (in meta tags, JSON/XML, HTTP headers, forms, as JavaScript variables etc.) and be able to send the obtained token back in the exact format the browser is expected to use.

If you want to detect CSRF-related errors and issues in your parser’s token handling more quickly, it’s a good idea to write code that logs all parser actions and server responses. This will make it much easier to pinpoint the exact step at which the problem occurs.

3. Use Headless Browsers

In particularly complex cases, you may be getting errors and bans not because of CSRF tokens, but due to the site’s protection systems triggering. The errors returned by the server are often uninformative and, even worse, look the same in both scenarios.

Therefore, instead of spending time and effort trying to emulate every possible browser behavior, it’s often enough to run parsing through a real browser. Problems with cookies, as well as most other protection systems, will disappear on their own. If that still doesn’t help, you’ll need to dive deeper into emulating user behavior and take care to hide traces of headless-browser usage.

Note that when working with headless and anti-detect browsers, proxies should be rotated differently - at the browser instance level, not at the parser level.

The best web drivers: Playwright, Selenium, Puppeteer, Nodriver.

Scraping with CSRF Protection: Practical Workflow

Here’s an example of a simple parser that looks for a CSRF token in a meta tag and uses it within a session (BeautifulSoup is responsible for parsing the HTML):


import requests
from bs4 import BeautifulSoup
BASE_URL = "https://example.com"
FORM_URL = f"{BASE_URL}/profile"
SUBMIT_URL = f"{BASE_URL}/api/update"
session = requests.Session()
# 1. Load the page to obtain the token and cookies
response = session.get(FORM_URL)
response.raise_for_status()
# 2. Extract the CSRF token from <meta name="csrf-token">
soup = BeautifulSoup(response.text, "html.parser")
csrf_token = soup.find("meta", {"name": "csrf-token"})["content"]
print("CSRF-токен:", csrf_token)
print("Cookie after GET:", session.cookies.get_dict())
# 3. Send a POST request with this token in the header
headers = {
    "X-CSRF-Token": csrf_token,
    "Referer": FORM_URL,             # often required
    "Origin": BASE_URL,              # sometimes required
    "User-Agent": "Mozilla/5.0",     # browser imitation
}
payload = {"name": "John", "city": "Berlin"}
post_response = session.post(
    SUBMIT_URL,
    json=payload,
    headers=headers
)
print("POST status:", post_response.status_code)
print("Server response:", post_response.text[:300])

Conclusion with a Checklist

Here is a checklist for CSRF-safe scraping:

You should avoid scraping areas of a site that require authentication. And if you do work in authenticated zones, you must automatically account for CSRF token handling (they can be located almost anywhere).
You shouldn’t use POST requests without a clear need, nor PUT, PATCH, or DELETE. The safest option is to stick to GET requests.
If you really can’t avoid POST requests, the parser must switch to a session- and cookie-based model (90% of all security mechanisms, including CSRF tokens, are tied to them).
If you’ve found where the server passes a CSRF token, you need to implement a mechanism to forward it in the exact way the target site expects. Any deviation, even in the sequence of steps, can trigger protection systems.
Teach your parser to handle errors. At the very least, you should have a detailed response logging mechanism so you can quickly analyze problems.
If you hit a dead end, use comparison with a reference. Manual debugging with developer tools in a real browser will help answer all your questions.

Check also the best practices for web scraping without getting blocked.

View full post