So, what are user agents? When your parser crawls pages, it has to identify itself to the website. The parameter responsible for the client app name is called the user-agent. Various design options and more can depend on it. However, many large websites primarily use the values of the user-agent line to organize protection against malicious actions, including parsing blocks.
Below, we'll explain what popular user agents are and how to prevent a parser from being blocked by the anti-fraud systems.
Understanding User Agents
A common User Agent is a text identifier that software sends within HTTP requests when establishing connections to websites or web services.
Put simply, it's a conditional name of a browser or other programs like search bots, spiders etc.
User agent example options:
You may notice that multiple parameters are transmitted within the user-agent line:
Attentive readers will notice that user-agent strings from different browsers are very similar but they still have distinctions. Sites determine the browser type based on these differences.
For more details, you can find reasons for such "similarity" on Wikipedia. It's because they tried to impersonate competitors' browsers during the active market competition. As a result, modern strings of useragents can contain up to 5 names at a time.
By logic, the most common user agent would be the one used on the largest number of devices. Currently, this is the stable Google Chrome version running on Android OS. According to Statcounter statistics, Android is used by over 43% of all network users.
When it comes to desktops, the undisputed leader is Windows along with Chrome. They account for over 27% of all devices connected to the internet.
The long-standing leader among all browsers, and, thus, the most popular user-agent, is Google Chrome. It accounts for over 65% of all internet requests.
So, the most reliable options to specify in HTTP requests for fine-tuning parsing are:
Does it make sense to mention that a user-agent is one of the most important parameters of a digital fingerprint?
Mind that spoofing the User-Agent may be insufficient for stable parsing without being blocked. Websites can gather much more information about client browsers/applications. Details on best parsing practices here. For example, headless browsers and quality proxies may come in handy.
Below are some of the most common tasks that analyzing user-agent strings allows us to accomplish:
Absolutely. If you send requests to a site without identifying yourself, this will be the first signal for blocking parasitic traffic.
The question here is more about the identification issue:
Tip 1. Use a user-agent that matches your real device.
The point is, the real operating system and hardware platform can be checked in various ways. Advanced anti-fraud systems can compare lists of pre-installed fonts. Naturally, Linux systems and MacOS have different default fonts. With very high probability, a free Ubuntu distribution will not contain proprietary fonts protected by Microsoft's copyright.
Another interesting technique is HTML5 Canvas fingerprinting. Certain elements are drawn on the page using the browser's built-in features. Rendering occurs differently in various operating systems. Therefore, by collecting colors from specific web page areas, you can check if the user lied in their user-agent or not.
Similar scripts can be used to check the platform based on other technologies: WebGL, WebRTC etc.
Tip 2. Use up-to-date stable browser versions.
This is also an interesting point: it's best to specify a browser version inside the user-agent that lags no more than 2 versions behind the current stable branch.
The thing is, outdated browser versions are excluded from the list of supported ones by many developers. By continuing to use an old browser, you run a risk of encountering the lack of compatibility and support for current web standards.
Commands of programmers from the largest websites and services can use such a sign to identify suspicious traffic.
Plus, many anti-detection browsers run on old Chromium versions. Isn't that a reason to take a closer look at the connected clients?
Tip 3. Rotate IP addresses and fingerprints.
To more accurately identify problematic requests, anti-fraud systems need time. For example, they can measure delays between requests from one IP address, compare identifiers stored in cookies and analyze user-agent strings.
By changing digital fingerprints along with IP addresses, you connect to the target site each time as if from a new device. At least, that's how it looks from the perspective of anti-fraud systems.
Protective mechanisms simply don't have time to react.
Analyzing the user-agent is the basis of many protective systems and mechanisms on large websites. Yes, just one user-agent may not be enough, so the parameter is studied in conjunction with other data: cookies, HTML5-canvas, WebGL etc.
Specifying a parsing user agent is essential, but it must be done correctly to avoid detection by anti-fraud systems.
Rotating proxies can help with bypassing blocks and sending a large number of requests to the same website.
You can buy quality mobile and residential proxies from us. Froxy offers over 8 million IPs with city-level targeting. You only pay for traffic, while the number of parallel connections may vary (up to 1000 ports per user).