Web Scraping

Techniques for Scraping Websites with Dynamic Content

Learn effective techniques for scraping JavaScript-rendered content using Python, headless browsers, and proxies. Handle dynamic pages scraping with ease.

Team Froxy 21 May 2025 12 min read

Techniques for Scraping Websites with Dynamic Content

Static WEB is gradually becoming a thing of the past. Along with simple HTML, the familiar “monolith” architecture is also slowly disappearing — this is when the resulting HTML for dynamic sites was generated entirely on the server and then sent to clients and their browsers in a fully rendered form. Many large websites and web services are now wholly built using JavaScript. If you try to load their HTML code, you’ll be pretty surprised: apart from the <HEAD> и <BODY> tags, there’s almost nothing there. Their content mainly includes links to numerous scripts. These scripts are loaded directly in the browser and executed there, dynamically generating the necessary HTML code as required.

This significantly complicates web scraping. You can no longer simply rely on the Python Requests and BeautifulSoup libraries to analyze the content and extract the required data. You initially need to execute all the scripts, render the page, and then look at the resulting HTML. To do this, you need either a special JavaScript engine or a full-fledged browser (browsers already come with such built-in engines: V8 for Chromium-based browsers and SpiderMonkey for Firefox).

This guide is for those who want to learn how to scrape websites with Python when the pages have dynamic content. It will explain what tools and libraries are needed for the job, whether effective techniques and approaches exist, and what errors and pitfalls you might encounter along the way.

Understanding Dynamic Content

The modern web has notably changed. It all started with HTML, which provided only the markup opportunity that allowed documents and web pages to look presentable rather than plain. Browsers received the HTML content, rendered it, and displayed it to users.

With the emergence of hypertext preprocessors (based on full-fledged programming languages like PHP), it became possible to assemble web pages from various blocks and components. Content itself started being stored in specialized databases. These types of websites came to be known as “dynamic.” CMS platforms like WordPress, Joomla, and Drupal typically manage their functionality.

This, however, was also insufficient. It turned out that not just websites, but also the content itself could be dynamic. JavaScript gained remarkable popularity because it allowed real-time updates to webpage content directly in the browser, often in response to user interactions. This well-known AJAX technology made it possible to “refresh” and re-render only a specific section or element of a page without reloading the whole thing. Eventually, developers realized that the traditional monolithic approach, involving server-side preprocessing of every page, was becoming too resource-intensive. Despite scaling efforts, servers were struggling to keep up with the growing number of users. Welcome, Google, Amazon, and Facebook!

Static site generators, powered by modern JavaScript frameworks like React, Angular, and Vue, became a clever solution. But don’t let the word “static” deceive you — these websites are not static. The difference is in where and when the HTML is generated: instead of assembling the final HTML on the server every time a user requests, the page is dynamically assembled in the browser.

Since JavaScript scripts are just “static files” (from the server’s perspective), they can be distributed across different servers, such as a decentralized CDN. In this case, the load on the hosting infrastructure notably decreases. As a result, an opportunity for convenient decentralized content storage appears.

Thus, when users request a particular page, they’re redirected to the nearest CDN node, loading the required JS scripts from there and dynamically building the page in the browser.

Clever, right? It is — until you try to scrape data from such “dynamic web pages.” To extract content, you must load all the JS scripts and execute them in a special runtime environment (a JavaScript engine, similar in concept to a Java Machine). Only then will the actual HTML be generated. This is why scraping dynamic websites often means running the site through a real browser. That’s precisely how the need for headless browsers and web drivers arose — tools that allow automation and interaction with real browsers (including libraries like Selenium, Playwright, and Puppeteer).

Differences Between Static and Dynamic Web Pages

In general, static web pages are pure HTML files. They can be saved as regular text documents with the .html extension, which means they will open in any web browser.

Here’s an example of a simple static page:

<!DOCTYPE html>

<html>

<head>

<title>This is the page title that appears in search results and when hovering over the browser tab</title>

<meta charset="utf-8">

</head>

<body>

<h1>This is the main heading shown in the page body</h1>

<p>

This is just a paragraph — it might contain some descriptive text about something useful…

</p>

<h2>Subheading — let’s call it "Proxy Types":</h2>

<ul id="thebestlist">

<li>Residential proxies</li>

<li>Mobile proxies</li>

<li>General proxies</li>

<li>Datacenter proxies</li>

<li>Private proxies</li>

</ul>

</body>

</html>

You can try it out yourself: create a plain text file on your computer, paste in the code above, and change the file extension from .txt to .html. Double-click the file — it should open in your browser and display the page.

Pages built using PHP (or other hypertext preprocessors) are first executed on the server in a special runtime environment and only then sent to the browser. Technically, this is the same “static content” as the browser works with simple HTML.

Dynamic web pages look a bit different. The brightest dynamic web page example is Google search results. The only things you’ll see in the raw HTML here are basic elements like <title>, <head>, and <body>. And even those are mainly used to load external JS files.

An attempt to “see” meaningful content or layout will lead nowhere. Moreover, the code is deliberately minified (spaces, line breaks, and other special characters are removed to reduce the code size).

Many other large websites and web services are organized similarly, so you can scrape TikTok, get data from IMDb, and other platforms.

The most common format of dynamic content used on the vast majority of websites and services, however, is embedding blocks or elements that load based on user actions. These may include:

Infinite scrolling feeds of products or comments load as the user scrolls down the page or clicks a special button (e.g., “Show more”);
Lazy-loaded images and videos are used to speed up the display of main content (all elements outside the main focus load in the background or as the user scrolls);
Customizable filters are usually found in product catalogs but can appear as complex widgets with image/media galleries.

Residential Proxies

Perfect proxies for accessing valuable data from around the world.

Try With Trial $1.99, 100Mb

Problems of Parsing Dynamic Content

At this point, the whole tragedy of web scraping dynamic content should be pretty straightforward. But let’s clarify the details, just in case:

There’s no clear HTML structure. The data is loaded via JavaScript. It's cut into pieces and stored across different scripts scattered throughout the page. The final code is assembled based on specific logic, including user actions and behavior. Triggers and terms can be anything—screen resolution, browser version, device OS, etc. Each time these triggers fire, the resulting HTML gets rebuilt from scratch.
You need a browser to get the final HTML. Yes, Chrome now has its official API (the CDP protocol), but even that requires an abstraction layer to make the command syntax more readable and logical. For this very reason, the most convenient way to interact with headless browsers and anti-detect environments is still via web drivers. Selenium and Playwright are the most popular tools for Python. In Java, there’s a dedicated headless browser implementation called HtmlUnit (this is basically a JavaScript engine with a ready-to-use API).
JavaScript running inside the browser unlocks powerful bot protection features. Thus, scrapers and headless browsers can be easily detected and blocked. For example, request headers can be checked (like User-Agent, Referer etc.), pre-installed font lists, CPU/GPU versions, cookies, localStorage, cache and even Canvas fingerprints can be analyzed, user behavior is monitored to detect “human-like” activity (mouse movement, scroll patterns, click behavior, etc.).
Obfuscation and data encryption. Data may be loaded in the encrypted form and decrypted only on the client side. Additionally, dynamically generated authentication tokens (like CSRF tokens) can be used.
Final HTML is well-protected from scraping. Since the content is generated "on the fly," scripts may use random combinations of characters and numbers to obfuscate class names, IDs, extra parameters, and attributes. As a result, every new page has a unique structure. So you can't just identify a specific selector once and expect your scraper to work on every similar page. For a deeper understanding, look at the HTML structure of Google's People Also Ask section as an example — it’s a classic case.

Choosing the Right Tools for Web Scraping Dynamic Content

Web Scraping Dynamic Content

Since web page dynamic content needs to be pre-rendered before you can scrape it, the right tools are crucial, especially when working with Python. Here's a breakdown of the key tools and libraries that may come in handy to you:

Selenium is one of the most well-established web drivers for automating headless browsers, developed and maintained by an independent team. It easily integrates with anti-detect browsers and handles multiple aspects, up to cloud instance orchestration. Selenium also supports multiple languages and platforms, including Python. By the way, it automatically installs and connects headless browsers.
Playwright is another web driver that supports various programming languages, browsers, and platforms. Maintained and supported by Microsoft, it is no less popular and functional than Selenium. Its key strength is the unified syntax, which is the same for all connected browsers and programming languages, making it very convenient and understandable. Read more about using Playwright in web scraping.
BeautifulSoup is a library for HTML syntax analysis. It breaks down raw page content and lets you extract the required elements. BS4 generally deals with scraping issues (data extraction). Read more about the BeautifulSoup application with examples.
Anti-detect browsers are great for convenient work with many digital fingerprints (an “isolated browser and profile farm”).
Rotating proxies ensure the protection of your data and emulate user location. They are more easily connected to niche software and are easily set up (via API or in your personal account—with the detection of rotating logic, IP address type, location, etc.).

This is a standard set for web scraping dynamic content. If you’d like, you can also look into additional libraries for parsing in Python.

You can add the rest as needed: tools for implementing random delays and asynchronous requests, scripts for advanced proxy rotation, libraries for convenient data formatting, etc.

Headless browsers are intentionally excluded from the list since they are usually bundled with web drivers nowadays.

Techniques for Scraping Dynamic Websites in Python

The general algorithm for web scraping dynamic content looks as follows:

You install and configure a web driver and a set of headless browsers. Browser configuration typically implies creating a unique digital fingerprint – cookies set, view history, installation of browser extensions, and stealth mode activation. Stealth mode is needed to hide the fact of headless browser use.
Then you need to connect the web driver library to your Python script. Using specific commands (according to the API of the selected driver), your script sends a list of target pages with dynamic content to the headless browser.
The browser opens and processes them (just as it would for a regular user). The parsing script can include additional actions like scrolling through dynamic web pages, simulating button clicks, etc.
The HTML of the resulting web page is passed to a parser library for content extraction. This step follows the standard scraping procedure, as if the script were working with static HTML from the start. Some web drivers have built-in parsing tools, but the choice here is up to personal preference – whichever data extraction method works best for you.
The extracted data is saved in the desired format. For example, it can be sent via an API, written to a file, or stored in a database.

Let’s dive deep into the technical details of parsing dynamic webpages further.

Using Network Requests (API Scraping)

Separate attention should be devoted to the possibility of data exchange with target websites via API. Some major platforms know that the information on their pages involves various business processes. To reduce the load on their main website (the web version of the interface), they provide a separate interface for parsers – a programmatic one (API, i.e., Application Programming Interface).

APIs can be found, for example, on platforms like Amazon (paid, with a small volume of free requests), IMDb (paid, GraphQL format), and TikTok (free, including for business use).

Accordingly, you can write your own Python script to generate requests to the desired API resource and parse the received responses into components. Usually, the data is supplied in a structured format, so parsing it into separate parts won’t be a problem. No headless browsers or web drivers are needed for this. Simple libraries for working with HTTP requests, like Python Requests, are enough (it even has built-in support for parsing responses in JSON format).

The main problem is that each site usually has its own API, its own set of commands, and response formats. Plus, to connect, you must register a developer account and obtain an API key (which serves as an identifier). Most accounts have specific technical limits. It’s worth noting that there are no severe sanctions if you exceed these limits. At most, you’ll receive an error instead of a response. Once the limit is reset, you can resume the data exchange.

Many large services, such as Google Search, YouTube video hosting, etc., do not have APIs. However, their data is extremely important and can be used in complex analytics.

Instead, you can use intermediary services that implement the missing APIs. These, however, are usually paid.

This is how the scheme of working with such services looks:

You send a request not directly to the target site, but to the service that implements the API interface.
They receive your request and independently parse data from the target site (or extract it from their own databases, similar to caching).
You get a response with ready-made structured data. So, again, there’s no need to install headless browsers or web drivers to render dynamic content.
All that’s left is to process the received information or leave it “as is.”

Froxy Web Scraper is an example of such a service. You can work with many major platforms like Google, Yahoo, Bing, Amazon, etc. For non-standard tasks, receiving the resulting HTML code from a list of target pages is an option. For each parsing task, you can set frequency and targeting parameters (proxy location – up to the city and mobile provider, and address type: mobile or residential). After parsing is complete, the service can send calls to specified webhooks, so your script can download the results and proceed to further processing. Data is returned in JSON or CSV format.

An example of this workflow is available in the material about scraping Google SERPs.

Worldwide Coverage

5 continents, No limits

Access our proxy network with over 200 locations and over 10 million IP addresses.

See Pricing

Using Selenium for Browser Automation

Install the library using the command:

pip install selenium

PIP is the standard package manager for Python. Note that for the command to run successfully on Windows, the path to the executable pip.exe must be added to the system environment variables.

pip Python

The WebDriver and a compatible headless browser package will be installed automatically. The process may take some time, so be sure to wait until the installation is successfully completed.

Here is an example of a simple script that will address a special test website (scrapingcourse.com/javascript-rendering) and collect a list of products:

import time

import csv

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.common.keys import Keys

# Selenium Settings

driver = webdriver.Chrome() # Launch the browser

driver.get("https://www.scrapingcourse.com/javascript-rendering") # Open a target website

time.sleep(5) # Wait for the results to load

# Get the search results

results = driver.find_elements(By.CSS_SELECTOR, "div.product-item") # Containers with products

data = [] # Lists for data storage

for result in results:

try:

title = result.find_element(By.CSS_SELECTOR, "span.product-name").text # Title

link = result.find_element(By.TAG_NAME, "a").get_attribute("href") # Link

price = result.find_element(By.CSS_SELECTOR, "span.product-price").text # Price

data.append([title, link, price]) # Add to the list

except Exception as e:

print(f"Error while parsing: {e}")

# Close the browser

driver.quit()

# Save in CSV

csv_filename = "products_results.csv"

with open(csv_filename, "w", newline="", encoding="utf-8") as file:

writer = csv.writer(file)

writer.writerow(["Title", "Link", "Price"])

writer.writerows(data)

# Print data to the console

for item in data:

print(item)

Create a text file named «web_scraping_dynamic_content_with_selenium». Change the extension if needed. Open it in any text editor, copy and paste the code above, and save the changes.

Now all that’s left is to run the script using the command in the console:

python web_scraping_dynamic_content_with_selenium.(extension)

If you haven’t navigated to the directory where the script is located, you’ll need to specify the full path to the file, for example:

python C:\my-scraper\web_scraping_dynamic_content_with_selenium.(extension)

After launching and closing the browser window (wait for 5 seconds), a list of products with prices and links should be printed to the console. Additionally, in the folder with the script, a CSV file with the same content will appear, but formatted as a ready-to-use table.

Using Playwright for Scraping Dynamic Content

Library installation is completed with a few commands. The first installs the library itself along with its dependencies, while the second installs all the necessary browsers:

pip install playwright

playwright install

If you need a specific browser, specify it as an additional argument. For example:

playwright install chromium

Here is an example of the simplest script, which works exactly like the Selenium script. That is, it accesses a test dynamic website (scrapingcourse.com/javascript-rendering) and collects a list of products from it:

import csv

from playwright.sync_api import sync_playwright

# Logic for parsing products

with sync_playwright() as playwright:

browser = playwright.chromium.launch(headless=False) # Launch browser with GUI

page = browser.new_page()

# Open target site

page.goto("https://www.scrapingcourse.com/javascript-rendering")

# Wait for results to load. Specify a particular selector (instead of time-based waiting, the default timeout is 30 seconds in case the selector is not found)

page.wait_for_selector("div.product-item")

# Collect the results

results = page.locator("div.product-item").all()

data = []

for result in results:

try:

title = result.locator("span.product-name").inner_text() # Title

link = result.locator("a").get_attribute("href") # Link

price = result.locator("span.product-price").inner_text() # Price

data.append([title, link, price])

except Exception as e:

print(f"Error while parsing: {e}")

browser.close() # Close the browser

# Save in CSV

csv_filename = "results_playwright.csv"

with open(csv_filename, "w", newline="", encoding="utf-8") as file:

writer = csv.writer(file)

writer.writerow(["Title", "Link", "Price"])

writer.writerows(data)

# Print data to the console

for item in data:

print(item)

Create a text file named «web_scraping_dynamic_content_with_playwright». Copy the code above and save the file. Close it and change the extension if needed.

Run the script in the console:

python web_scraping_dynamic_content_with_playwright.(extension)

The output will be similar to the first script. In the script's directory, a file named results_playwright.csv should appear (containing a table of product titles, links, and prices).

What Else You May Need for Scraping Dynamic Content

First, you need to thoroughly explore and understand how syntax analyzers work. This is the cornerstone of quality scraping — they’re essential for navigating and interacting with the DOM structure (i.e., the HTML markup layout).

Without understanding what and where to look for, you won’t be able to "explain" anything to your program.

Different web drivers use different syntaxes to search and extract data from elements.

For example, Playwright works with locators (this is an internal mechanism) and supports XPath and CSS syntax. If this is not enough, you can install additional libraries for other syntax analyzers like BeautifulSoup (this will come with a different syntax and DOM traversal logic).

Selenium has a slightly different approach but is broadly similar to how Playwright works. Its internal «By» tool can search elements based on names, IDs, tags, classes, CSS selectors, and XPath syntax. Again, you can pass the resulting HTML to another scarper if needed.

Second, extensive web services with dynamic content have learnt to detect headless browsers through digital fingerprints. To avoid being blocked or shown CAPTCHAs, you need to hide these traces and try to work through proxies (since each new proxy IP address comes as a new client, which is then analyzed and possibly blocked by security systems individually).

Proxies are usually connected to headless browsers at the level of an individual instance. Therefore, instead of coming up with clunky workarounds for rotating proxies and launching new browsers (because your device’s memory isn’t unlimited), initially setting up rotating proxies makes more sense.

Conclusion and Recommendations

Scraping Dynamic Content

Modern websites have become much more complicated. These are no longer just static HTML pages but full-fledged web applications. Working with dynamic content has become significantly more challenging because you can’t simply grab the DOM structure and extract the required data from the HTML. Instead, you need to load the site or its dynamic pages in a real browser (this can be done using headless or anti-detect browsers) before you even start scraping.

Even though you’re working with content through browsers, you still have to consider many technical nuances, including digital fingerprint emulation, user behavior simulation, and working through proxies.

You can find high-quality proxies with us. Froxy offers over 10 million IPs (residential, mobile, and datacenter proxies targeting city and mobile carrier level).

You can always switch to API-based parsing if you don’t want to overcomplicate your Python scraper. Froxy can help you with that, too. You can submit a data collection task for specific websites, and we’ll handle everything for you, delivering ready-to-use, structured data in JSON or CSV format. No need to configure proxies or headless browsers. Read more in the Froxy Scraper service description.

Techniques for Scraping Websites with Dynamic Content

Understanding Dynamic Content