Screen Scraping: What It Is and How It Works

Written by Team Froxy | Feb 22, 2024 10:00:00 AM

Neural networks and artificial intelligence are a burning issue for discussion nowadays. Many experts claim that this technology is not more than just a toy. The practical AI application seems unprofitable to them. We categorically disagree, however. Below, we will dispel myths and talk about entirely real and financially justified applications of artificial intelligence technologies – specifically computer vision in conjunction with screen parsing (screenshots, images etc.).

Businesses often use automated data collection methods to solve their tasks: to monitor competitors, analyze strategies and assortments, verify the work of counterparties etc. In most cases, textual information analysis on web pages is used. However, information may not always be stored in text only. This may be images, video and audio recordings, more complex professional document formats (3D models, CAD system exports etc.). What about them?

What can be "seen" can already be accurately parsed today. All the intricacies and nuances of this process are discussed below.

Screen Scraping Meaning

So, what are screen scrapers and how do they work? Actually, the process of screen parsing (screen scraping) implies the procedure of extracting data from images, screenshots and user interfaces. This method of data extraction is used when simpler and cheaper approaches like text or HTML structure parsing are impossible.

Here is detailed material on what parsing is, its advantages and disadvantages.

It often happens that visual data also contains text and labels, which can be read if you are a person with good eyesight. But for parsing programs that cannot access the data to the page source code, for example, as in the case of a mobile application screen, the task becomes daunting.

In this very case, computer vision and image recognition technologies come to the rescue. They can replace human eyes and "see" what specific information is displayed on the screen (for example, in a screenshot or simply in a photo).

What Is Screen Scraping Used For?

In most cases, scraping serves business needs. Screen scraping is no exception. Here are the most common scenarios for using screen parsers:

Market and competitor research. This refers to the comparison of personal accounts, UI approaches and strategies (usability), typical inscriptions, usage rules, etc. Part of the work can be done manually. However, if there are many competitors, the volume of processed information increases significantly. Automation systems and screen recognition are essential.
Price monitoring. Screen scrapers help when a standard parser cannot access the source code of a web page or data within a mobile application (online store). Additionally, parallel comparison of offers in the web version of the store and in the mobile application can be conducted (major players use different pricing approaches on mobile and desktop devices).
Review monitoring and reputation building (PR). Screen scrapers handle even the most complex situations for collecting real reviews on various platforms – social networks, messengers, niche websites, bulletin boards etc. It's worth reminding that nowadays, graphics (emojis, emoticons, stickers etc.) are used to express emotions and reactions along with text.
Ad checking. Businesses may publish their ads in various places. However, trusting partners blindly may result in rapid failure. It makes sense to at least spot-check the fulfillment of contract conditions. Again, textual parsers can only check the ad code, not the way it actually looks on the website or in the mobile application. For screen scrapers, however, that’s not a problem.
UX (User Experience) analysis. How can you tell if a customer likes the interface of your personal account or not? How convenient is it? Are there any errors? Some people use special surveys, some involve metrics, but the most reliable way to check any interface is to make screenshots. A parser can automate the processing of screens and categorize them.

There may be more complex situations where screen scraping is applied.

A Few Words About Security and Confidentiality

Note! Many software developers, especially those of banking services, government digital services etc. strive to protect information within their products. However, due to the fact that screen scrapers can "see" what is happening on the screen almost like regular users, they can disclose confidential information and sensitive data, intentionally or unintentionally.

For this very reason, one must be extremely cautious. The risk of hacking and financial loss significantly increases. It is important to know and take this fact into account.

A screen scraper turns into a powerful weapon in the hands of malicious users.

How Do Screen Scrapers Work

When it comes to analyzing plain text or web pages (which are also basically textual documents), the operation of a parser is clear and understandable. The program receives the body of the document, searches for the necessary markup elements or matches with a specified pattern within the text, and then extracts the data inside the element. That's it; the data can be entered into a table and exported in the required format.

A small exception is working with dynamic content, where the end HTML document is generated through JavaScript scripts. In this case, the so-called "headless browsers" are used. They receive the URL to visit, load the page, execute all scripts and only then return the final HTML document to the parser. The work is still completed at the text level.

But how can you screen scrape correctly?

Usually, people only see the final result of the work of most programs. All user interaction takes place in the so-called graphical user interface (GUI).

To know how to screen scrape, you should know that these programs always work with data, not interfaces. Since parsers are also programs, they need to be provided with information in a format that they can "digest".

When using screen scrapers, the readable format would be an image, such as a screenshot of a mobile application screen, or a PDF file, which can be created in a headless browser via API.

Next, optical character recognition (OCR) or computer vision (using specialized neural networks) comes into play.

OCR programs work almost like neural networks. They have a database of images of all known letters, numbers and other characters or symbols. The original image is sliced into suitable pieces (squares or rectangles with contrasting elements in the center), and then each character part is "compressed" to a conditional "standard" and compared with the database of reference images (i.e., with each individual character in the database). As a result, the program selects the character from the database that matches the reference image more than any other.

Computer vision is slightly more complex. The image of characters is processed through a special neural network. Typically, one neural network is configured to work with one language. Certain neurons are trained to detect only a specific character. As a result, the neural network outputs the signal of the neuron that has recognized its letter more accurately than others.

The advantage of a neural network is that it doesn't output disjointed letters and symbols from different alphabets. It immediately forms words. This effect is achieved through additional layers of the neural network, which are trained to look for words rather than characters. Even if one of the characters is incorrectly recognized in the first layer, it will be "corrected" in subsequent layers.

As a result, the screen scraping scheme works as follows:

The parser program takes a screenshot of a user's screen.
The image is passed to the text recognition module (based on OCR or based on a neural network).
The module returns the text it was able to detect on the screen.
Further parser logic starts operating (data is entered into a table or other actions are performed - it all depends on the goals and tasks of the parser).
Done.

Instead of neural networks used for text recognition, solutions capable of recognizing and describing objects in the image can be applied. In this case, the recognition module will return to the parser not only the text found on the screen but also object descriptions (like "brown chair" or "scarf of the N brand" etc.).

The Difference Between Screen and Web Scraping

Basically, we have partially answered the question:

Web page parsing primarily works with the text of HTML documents
Screen scraping deals with images and screenshots. Text is primarily sought within images, but certain neural networks make it possible to extract various other data like textual descriptions of objects,for example.

The main distinction lies in the availability of the optical character recognition module (OCR) or computer vision (trained neural network).

This module gives rise to specific advantages and disadvantages of screen scrapers.

Advantages of screen scraping (compared to web parsing):

Screen scrapers can retrieve data from places where a regular text parser would never gain access, such as inside mobile applications.
The scraper is not vulnerable to special traps and bookmarks in the code designed to detect bots and automation scripts. It essentially "sees" what a regular user would see.
The scraper can simultaneously address other specific tasks like the analysis of user experience, detection and description of visual objects etc.
Since web page visualization and conversion into images are usually implemented through headless browsers, proxy expenses will be minimal (headless browsers can simulate user behavior, hence the risk of triggering bot protection systems is minimal).

Here's a guide on how to parse without encountering any blocks.

Disadvantages of screen scraping (compared to web parsing):

The process of web page rendering occurs in a regular or headless browser, which consumes significant resources compared to a classical text parser. Each new thread will require a new browser sample, leading to a rapid increase in resource consumption.

If screenshots are taken from mobile applications, special permissions at the level of the mobile operating system will be required to create them. Naturally, the operation of each app sample also implies computational resources. Special virtualization technologies will be needed to run multiple threads at a time.
Any text recognition software consumes a lot of PC resources. The more there is on the screen, the longer the recognition process will be.
Neural networks implementing computer vision algorithms are the most resource-intensive. Special configurations with high-performance graphics processors will be required for their operation.
If the recognition process is based on ready-made remote services and APIs (for example, on Chat GPT), payment for requests will be required. These are typically purchased in packages or through a subscription.
Creating a parser capable of "reading" screens will be significantly more complex than creating a simple text parser.
Data from images lack markup like HTML documents. Therefore, extracting only the necessary text segments from a recognized image will be quite challenging.

Conclusion and Recommendations

Now that you know what is screen scrape procedure and what it implies, you can initiate and handle the process on your own, taking into account the minor nuances.

No matter how advanced your parser is, even if it can recognize screens and emulate user behavior, you will still need high-quality proxies.

Only with proxies can you bypass protection based on the number of simultaneous connections and thereby speed up the collection of a large data volume.

The best mobile and residential proxies can be purchased from us. Froxy is the #1 service in terms of quality and stability. We offer excellent coverage - 8.5 million IPs, 200+ countries, precise targeting (up to the city and ISP). Up to 1000 parallel ports per account. Fast rotation. You pay only for traffic, not for the number of proxies.

View full post