CSRF is an abbreviation that refers to a specific type of attack on websites that involves intercepting authorization data and using it for malicious purposes. In this scenario, the victim doesn’t actually perform any actions, while the target website believes it is interacting with a real user who previously logged into their account (this information is stored in the browser, specifically in cookies).
Accordingly, a CSRF error does not appear out of nowhere. It is usually associated with filling out forms or sending direct HTTP requests to the server that can modify something in the databases. If you have written your own parsing script (web scraper) and encountered a CSRF token error, this article is for you.
Below, we’ll explain what a CSRF error is, why and in which situations it occurs, and how to avoid it when building parsers.
The attack mechanism has been known for a long time, since the 1990s. Put very simply, CSRF proxy (short for Cross-Site Request Forgery) comes down to the fact that an attacker uses existing authorization data on the target site, stored in the browser, to send their own requests.
The moment of compromise usually occurs when the victim visits a third-party site controlled by the attacker. A special script is executed there, which uses the browser cookies and forges (redirects) the request.
Protection against this type of attack has also long been established - it involves the use of special CSRF tokens that are tied to user sessions (sessions are stored on the server, not in the browser) or even generated for each new request/action of the user. If a valid token is not included in the request to the server (for POST, PUT, DELETE methods, or when submitting forms on pages), the server will return a CSRF error.
Another protection mechanism is analyzing the HTTP_REFERER field to determine that the request is being sent by an intermediary rather than directly.
At this point, it’s already becoming clear how a parser can run into CSRF errors. This situation may occur in the following cases:
Such situations are quite rare, but the developer should be ready to face the challenge.
The main problem with errors related to cross-site request forgery is that target websites often don’t return error messages that clearly indicate the absence of a CSRF token. Implementing cookie and CSRF support for every site indiscriminately is also unreasonable - in 90% of cases it will be unnecessary.
So, how should you detect CSRF errors?
There is no dedicated HTTP status code for CSRF. Therefore, the server typically uses standard response codes: 403 (forbidden), 419 (returned when sessions expire, including when a CSRF token is missing), 400 (bad/invalid request).
Less frequently, a direct indication can be found in the error text itself. It makes sense to search for keywords like “token”, “CSRF”, “invalid”.
In some cases, the server may return empty responses, effectively “ignoring” the parser’s requests.
None of these options can give you a 100% guarantee that you are facing a block due to a missing CSRF token. The only exception is a direct reference to CSRF in the error message (if you see that, consider yourself lucky).
More about how advanced protection systems (WAF) can behave.
As a “reference sample,” you should use how the server behaves and responds when working through a regular browser. That is, you need to open the target page manually in your real browser, with all its settings, cookies and other digital fingerprint attributes.
Once you’ve confirmed that the site works “as it should,” you can access the page or API from your parser and compare the resulting response.
If the server returns an error or doesn’t return anything at all, there’s a high chance you haven’t correctly emulated the behavior of a real client (including a possible rejection due to a missing CSRF token).
At this point, you need to go back to the real browser and carefully examine the data exchange process with the server. Most often, this involves digging into the analysis of POST requests (Developer Tools / DevTools → Network tab).
What exactly you should pay attention to:
To eliminate or at least minimize the risk of scraping errors related to CSRF tokens, we suggest the following approaches.
This approach is based on the following principles:
In other words, if you have POST requests, your parser cannot access pages directly without first opening a session and obtaining cookies.
Don’t forget: tokens, like sessions, can have their own lifetime. The actual validity period can only be determined experimentally. The most common range is about 10 - 30 minutes, but this can hardly be called a “golden standard.” Each site may have its own timing configuration - up to binding a CSRF token to every new request (user action).
You should account for all the common ways a CSRF token can be passed (in meta tags, JSON/XML, HTTP headers, forms, as JavaScript variables etc.) and be able to send the obtained token back in the exact format the browser is expected to use.
If you want to detect CSRF-related errors and issues in your parser’s token handling more quickly, it’s a good idea to write code that logs all parser actions and server responses. This will make it much easier to pinpoint the exact step at which the problem occurs.
In particularly complex cases, you may be getting errors and bans not because of CSRF tokens, but due to the site’s protection systems triggering. The errors returned by the server are often uninformative and, even worse, look the same in both scenarios.
Therefore, instead of spending time and effort trying to emulate every possible browser behavior, it’s often enough to run parsing through a real browser. Problems with cookies, as well as most other protection systems, will disappear on their own. If that still doesn’t help, you’ll need to dive deeper into emulating user behavior and take care to hide traces of headless-browser usage.
Note that when working with headless and anti-detect browsers, proxies should be rotated differently - at the browser instance level, not at the parser level.
The best web drivers: Playwright, Selenium, Puppeteer, Nodriver.
Here’s an example of a simple parser that looks for a CSRF token in a meta tag and uses it within a session (BeautifulSoup is responsible for parsing the HTML):
import requests
from bs4 import BeautifulSoup
BASE_URL = "https://example.com"
FORM_URL = f"{BASE_URL}/profile"
SUBMIT_URL = f"{BASE_URL}/api/update"
session = requests.Session()
# 1. Load the page to obtain the token and cookies
response = session.get(FORM_URL)
response.raise_for_status()
# 2. Extract the CSRF token from <meta name="csrf-token">
soup = BeautifulSoup(response.text, "html.parser")
csrf_token = soup.find("meta", {"name": "csrf-token"})["content"]
print("CSRF-токен:", csrf_token)
print("Cookie after GET:", session.cookies.get_dict())
# 3. Send a POST request with this token in the header
headers = {
"X-CSRF-Token": csrf_token,
"Referer": FORM_URL, # often required
"Origin": BASE_URL, # sometimes required
"User-Agent": "Mozilla/5.0", # browser imitation
}
payload = {"name": "John", "city": "Berlin"}
post_response = session.post(
SUBMIT_URL,
json=payload,
headers=headers
)
print("POST status:", post_response.status_code)
print("Server response:", post_response.text[:300])
Here is a checklist for CSRF-safe scraping:
Check also the best practices for web scraping without getting blocked.