Any online platform with a social component is of interest to businesses. This is mainly because of the audience, as these platforms allow for the promotion of goods and services. However, social networks like TikTok make it possible to achieve other goals, such as gathering statistics, monitoring competitors, identifying current trends, etc.
As you might guess, this material is about creating your own TikTok scraper.
TikTok is a unique phenomenon. The social network was launched in China (where the app is called "Douyin" in the domestic market) and focused on creating short videos. TikTok quickly evolved from an entertainment platform into a search engine where you can find short visual instructions and tutorials on any topic. It also serves as a place to explore news, expert opinions, etc.
Today, TikTok ranks among the top five most popular websites, with a monthly audience exceeding 1.58 billion users (as of 2024). Even more interestingly, its audience shows high engagement: users spend about an hour a day watching videos, according to statistics. Generation Z, in particular, prefers TikTok for search queries (over Google).
Isn't that a compelling reason to pay close attention to the data stored on TikTok?
Why Scrape TikTok?
Everyone has their own reasons to scrape TikTok:
- Data analytics can discover new trends and tendencies, analyze engagement and behavioral patterns, evaluate the effectiveness of different content types;
- SMM specialists can track follower growth for clients’ accounts and their competitors, monitor video view counts to assess content performance, and study audience engagement and other metrics to develop more effective promotion strategies.
- Digital marketers can measure the effectiveness of ad campaigns, identify preferences and triggers for target audiences, optimize advertising budgets based on detailed data insights;
- Content creators and bloggers can track video views and competitor performance, identify trending topics for future content, develop more effective strategies to boost engagement and grow their audience as well as automate interactions with subscribers to save time.
- Businesses and brands can monitor mentions and brand recognition, track negative feedback (complaints or misinformation) and public sentiment, and find influencers to serve as brand ambassadors, enhancing recognition and creating a positive brand image.
The primary goal of any scraper is obviously automation enhancement. While the same tasks can be performed manually, it would take significantly more time. Automated data collection through scraping allows for faster and more agile responses. Well-written scripts minimize errors and provide data in the desired format.
Key Points from TikTok’s Terms Regarding Data Scraping and Data Use
When creating a TikTok scraper, it’s always important to consider the platform’s usage policies and restrictions (TikTok rules) as well as the robots.txt file (current TikTok version).
The robots.txt file specifies allowed and prohibited sections for crawling/indexing.
When it comes to TikTok scraping, it is disallowed in the following sections: /inapp, /auth, /embed, /link, */directory/ (the files are usually stored here), /search/video?, /search/user?q=, /shop/view/product/ (pages of certain products in users’ stores).
TikTok has recently implemented the open API for research, but it’s currently available only to companies in the European Economic Area. The API allows retrieving data like:
- User profiles, followers, liked/pinned videos, and reposts;
- Comments, captions, subtitles, likes, and view counts for specific videos;
- TikTok shop data (ratings, reviews, price,s and sales metrics).
View the TikTok API documentation here.
When using API (TikTok Research Tools), limits on individual request codes are provided. It is also prohibited to set the parallel use of scrapers. Companies using the API are obligated to comply with data privacy regulations like GDPR and CCPA and all related legal outcomes.
Applications for research accounts may take up to four weeks to process.
If the suggested API’s restrictions are unsuitable, creating your own TikTok scraper is an option. However, it’s essential to be aware of TikTok’s official rules, which prohibit automation scripts, excessive server load that disrupts service quality, collecting personal data without consent, and fake accounts with false data.
Fortunately, TikTok content is accessible without authentication, making it challenging to identify and hold users accountable for automated scraping.
Enough theory - let’s move on to the practical part!
Technical Prerequisites for TikTok Scraper
How to scrape user accounts on Instagram and TikTok? Technically, it's straightforward. Most parsers are structured in a similar way: one part of the program navigates pages and retrieves their HTML code, while the other processes the code, breaks it into components and extracts the required data. These data are then exported to a table, a file, a database or another program.
However, there are specific requirements unique to certain websites or web services. For a TikTok scraper, the following points are worth mentioning:
- The primary content on TikTok is videos. If you want to analyze the content itself, you will need computer vision technologies or APIs of specialized neural networks that can return descriptions based on links.
- If you're only interested in data for analysis, you'll need to rely on indirect indicators like video duration, view counts, likes, comments, etc.;
- TikTok is a dynamic website, meaning that instead of receiving HTML markup, the browser gets a set of JavaScript scripts. This makes classical HTML analysis libraries ineffective. Each page must be processed through a browser or a JavaScript rendering engine.
- TikTok actively combats parasitic traffic (as automation is explicitly prohibited in its terms of use). Therefore, you must plan mechanisms to "humanize" the parser in advance. Simply setting a user agent is not enough; you also need to verify other digital fingerprint parameters and consider using proxies.
Building the Basic TikTok Scraper
Let's start by setting up the development environment and installing the required libraries. We chose Python as our programming language because it allows for simpler and faster development of complex parsers. Here's a list of the best Python libraries for parsing, just in case.
Setting Up the Python Environment
First, you need to install Python itself. To do this, download the distribution package from the official Python website. The current version of Python is from the third branch (as of the time of writing, it's v3.13).
In many popular Linux distributions, Python might already be installed in the system. If it’s not, you can use the built-in package manager to install it.
When installing on Windows, don’t forget to add the main command paths to the system's environment variables.
You can also activate the pip package manager if needed.
Additionally, it is desirable to create a virtual environment in Python so that project files do not spread to external catalogs.
Enter the command:
python -m venv tiktok-scraper
Right after its completion for the current manager, the catalog for your future program will be created. When it comes to Windows, this is: C:\Users\USER\tiktok-scraper.
Now you need to activate our current virtual environment (enter the command in the console):
tiktok-scraper\Scripts\activate.bat
The command for PowerShell:
tiktok-scraper\Scripts\activate.ps1
If the console argues with the script activation rights, enter the command:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Then, repeat the procedure of environment activation.
In Linux systems, the activation command will differ:
source tiktok-scraper/bin/activate
Now, all connected libraries and scripts will be stored in the virtual environment catalog.
Install the libraries (actually, more components will be installed, and the package manager will activate the required dependencies):
pip install asyncio nodriver beautifulsoup4
We use the following libraries:
- Asyncio – a tool to work with asynchronous calls;
- Nodriver – a driver that connects to the headless browser, which can conceal signs of its work and is the successor of the Undetected-Chromedriver library. Here is the official GitHub project page.
- Beautifulsoup4 – is the HTML analyzer library. It ensures the choice of the required web page elements is completed with simple and understandable commands. The detailed manual on parsing with Beautiful Soup.
Checking the TikTok Scraper Work
Let's create our first Python script and run it. The script will explicitly open a browser window (not in headless mode), navigate to TikTok's homepage, wait for 10 seconds, and then close.
To do this, navigate to your virtual environment's directory (C:\Users\YOUR_USER\tiktok-scraper) and create a simple text file. Name it test.txt. Change the file extension if needed.
Open the file in any text editor or IDE and add the following content:
#Import project libraries but without beautifulsoup4
import nodriver as headlesschrome
import asyncio
async def main():
try:
print("Launch the browser...")
# To see the result of the browser's work, launch it. Disable the headless mode for this purpose.
browser = await headlesschrome.start(headless=False)
# Pass the target page to the browser
print("Navigate to the TikTok homepage...")
page = await browser.get("https://www.tiktok.com/")
print("Page loaded. If you see it in the browser window, the process is completed successfully.")
print("Wait 10 seconds before closing...")
# This is why we need the asyncio library
await asyncio.sleep(10)
except Exception as err:
# Handle exceptions just in case. If there is an error, we will display it in the console
print(f"Error detected: {err}")
finally:
# End of script
print("Testing complete.")
if __name__ == "__main__":
headlesschrome.loop().run_until_complete(main())
Save the file and run it. Since the file is in the virtual environment's directory, you can run it by double-clicking or by right-clicking the file and selecting "Open With" → "Python."
At this point, you might start feeling like a hacker!
Now, the next step is to teach our TikTok scraper how to extract the desired HTML code from the pages.
Mobile Proxies
Premium mobile IPs for ultimate flexibility and seamless connectivity.
Parsing the TikTok Profile
Let’s use the account of the most popular TikToker, Khaby Lame as an example. The URL structure for accessing a specific profile is as follows: https://www.tiktok.com/@khaby.lame.
This means the username is added to the main domain, and the "@" symbol is mandatory before the username.
Even if you’re working without logging in, you can still see:
- The user’s avatar.
- Their username and display name.
- The number of accounts they’re following.
- The number of followers they have.
- The number of likes on their profile.
- The profile description (some use this space for a bio, thoughts, or useful links).
Now, let’s analyze the HTML code. Open your preferred browser, navigate to a profile page, and inspect the layout. Hover over a specific element, such as the username, right-click, and choose “View Code” from the context menu. This opens the developer tools panel, focusing on the selected element in the code.
The required code part will look as follows:
<h1 data-e2e="user-title" class="….">khaby.lame</h1>
To quickly find it in the general document structure, you can rely on the «data-e2e="user-title"» attribute. Mind that the CSS class name is intentionally obfuscated to complicate parsing efforts.
You can view other profile attributes and highlight the required HTML code. We won’t waste your time and point out the attributes of other profile elements:
- data-e2e="user-subtitle": Display name.
- data-e2e="user-title": Username.
- data-e2e="followers-count": Number of followers.
- data-e2e="following-count": Number of accounts followed.
- data-e2e="likes-count": Number of likes.
- data-e2e="user-bio": Profile bio.
Note: TikTok frequently updates its code to block parsers. If the script stops working, you will need to update it with new attribute values.
Create a new project file and fill it with code:
# Import the required libraries using finally BeautifulSoup
import asyncio
import nodriver as hdlschrome
from bs4 import BeautifulSoup
import json
# Create the asynchronous file to scrape the TikTok profile data
async def scrape_tiktok_profile(username):
try:
# Create the log in the console about the start of the parsing process
print(f"Start the TikTok profile parsing: @{username}")
# Launch the browser
browser = await hdlschrome.start(headless=False)
print("Browser successfully launched")
# Pass the required URL to the browser, replacing the real profile with regard to the username variable
page = await browser.get(f"https://www.tiktok.com/@{username}")
print("The TikTok profile page successfully loaded")
await asyncio.sleep(30) # Ask the browser to wait for 30 seconds. You can change the amount of time, if needed. A large timeout can be justified for the slowest connections.
print("Wait for 30 seconds for the complete content load")
# Pass the resulting HTML code to the html_content variable to further parse it
html_content = await page.evaluate('document.documentElement.outerHTML')
print(f"HTML-content extracted: {len(html_content)} symbols)")
# Pass the HTML code for processing in the BeautifulSoup analysis library
soup = BeautifulSoup(html_content, 'html.parser')
print("HTML parses BeautifulSoup")
# Define the data structure - to describe the TikTok user profile
profile_info = {
# The select_one method allows finding only the first element with the defined attributes
'username': soup.select_one('h1[data-e2e="user-title"]').text.strip() if soup.select_one('h1[data-e2e="user-title"]') else None,
'display_name': soup.select_one('h2[data-e2e="user-subtitle"]').text.strip() if soup.select_one('h2[data-e2e="user-subtitle"]') else None,
'follower_count': soup.select_one('strong[data-e2e="followers-count"]').text.strip() if soup.select_one('strong[data-e2e="followers-count"]') else None,
'following_count': soup.select_one('strong[data-e2e="following-count"]').text.strip() if soup.select_one('strong[data-e2e="following-count"]') else None,
'like_count': soup.select_one('strong[data-e2e="likes-count"]').text.strip() if soup.select_one('strong[data-e2e="likes-count"]') else None,
'bio': soup.select_one('h2[data-e2e="user-bio"]').text.strip() if soup.select_one('h2[data-e2e="user-bio"]') else None
}
print("Profile information successfully extracted")
# Return profile data
return profile_info
# Process exceptions and errors
except Exception as err:
print(f"Error during parsing: {str(err)}")
return None
#Stop the browser to reveal the resources
finally:
if 'browser' in locals():
browser.stop()
print("Browser closed")
async def main():
# Here you can redefine the user nickname whose data you’d like to parse
username = "khaby.lame"
# Create the asynchronous process and pass the user nickname variable to the parsing function
profile_info = await scrape_tiktok_profile(username)
# If the data are not blank, provide the parsing result in the console
if profile_info:
print("\Profile information:")
# Iterate through the array by keys, just in case, replacing underscores with spaces to make it look nice.
for key, value in profile_info.items():
print(f"{key.replace('_', ' ').title()}: {value}")
# Save the data in the JSON file
with open(f"{username}_profile.json", 'w', encoding='utf-8') as f:
json.dump(profile_info, f, ensure_ascii=False, indent=4)
print(f"The profile info saved to the file {username}_profile.json")
else:
print("Couldn’t parse profile info.")
if __name__ == "__main__":
hdlschrome.loop().run_until_complete(main())
If you want the script to keep the console open after execution, you need to run your program from the command line:
First, change the directory:
cd C:\Users\YOUR_USER\tiktok-scraper
We have intentionally added a program block for exporting data to a JSON file. If desired, you can organize data export in a tabular format (CSV) or to an SQLite database.
Parsing TikTok Video Pages
Here is a sample of the video page URL:
https://www.tiktok.com/@khabylame.62/video/7024119047490964737
In the structure of the address, there is a clear association with the user, as well as an indication of the specific video identifier (formatted as «/video/identifier_here>).
On the video page itself, you can find:
- The video author (username and display name).
- A list of hashtags.
- The title of the music track.
- The publication date.
- Quantitative metrics (views, comments, number of bookmarks, shares, etc.).
Following the approach used in creating a TikTok profile parser, you need to examine the HTML code of the video page and identify the attributes corresponding to each required element—using the browser's developer console.
We won’t bore you with screenshots. Here's what we managed to identify:
- Music Title: Located in the «h4» header with the attribute data-e2e="browse-music" followed by a downward shift through the tags«a» –> «div».
- Publication Date: Found in the «span» element with the attribute data-e2e="browser-nickname", followed by a shift down to the last child «span» tag.
- Number of Likes: data-e2e="like-count" attribute.
- Number of Comments: data-e2e="comment-count" attribute.
- Number of Shares: Element with the data-e2e="share-count" attribute.
- Number of Bookmarks: data-e2e="undefined-count" attribute.
If these elements are added to the code of the previous script and the video URL is passed instead of the user ID, the program will scrape the required information.
Here’s what we’ve got:
# Import the required libraries using BeautifulSoup
import asyncio
import nodriver as hdlschrome
from bs4 import BeautifulSoup
import json
# Create an asynchronous function to process the variable containing the video ID
async def scrape_tiktok_video(video_id):
try:
# Load the log in the console the parsing has been initiated
print(f"Initiate parsing of the TikTok video: {video_id}")
# Launch the distinct browser sample
browser = await hdlschrome.start(headless=False)
print("Browser launched successfully ")
# Pass the URL to the browser that has to be opened, substituting the video ID into the URL
# Note that the nickname has already been added to the URL!
page = await browser.get(f"https://www.tiktok.com/@khabylame.62/video/{video_id}")
print("TikTok video page loaded successfully")
await asyncio.sleep(30) # Allow 30 seconds for full content loading (you can increase or decrease the amount of time, if needed)
print("waiting 30 seconds for full content load")
# Here we pass the resulting HTML code to the html_content variable for further parsing
html_content = await page.evaluate('document.documentElement.outerHTML')
print(f"HTML code extracted: {len(html_content)} symbols)")
# Pass the HTML code for processing in the BeautifulSoup library
soup = BeautifulSoup(html_content, 'html.parser')
print("HTML code parses BeautifulSoup")
# Define the data structure, in this case - to describe the TikTok user profile
video_info = {
# The select_one method allows finding only the first element with detected attributes
'music': soup.select_one('h4[data-e2e="browse-music"] a div').text.strip() if soup.select_one('h4[data-e2e="browse-music"] a div')else None,
'likes': soup.select_one('[data-e2e="like-count"]').text.strip() if soup.select_one('[data-e2e="like-count"]')else None,
'comments': soup.select_one('[data-e2e="comment-count"]').text.strip() if soup.select_one('[data-e2e="comment-count"]')else None,
'shares': soup.select_one('[data-e2e="share-count"]').text.strip() if soup.select_one('[data-e2e="share-count"]')else None,
'bookmarks': soup.select_one('[data-e2e="undefined-count"]').text.strip() if soup.select_one('[data-e2e="undefined-count"]')else None,
'date': soup.select_one('span[data-e2e="browser-nickname"] span:last-child').text.strip() if soup.select_one('span[data-e2e="browser-nickname"] span:last-child')else None
}
print("Video data extracted successfully")
# Return the video data
return video_info
# Process the exceptions and errors
except Exception as err:
print(f"Error detected during the parsing process: {str(err)}")
return None
# To retrieve the resources, stop the browser
finally:
if 'browser' in locals():
browser.stop()
print("Browser closed")
async def main():
# Here you can redefine the video identifier and which data you’d like to parse. Don’t forget to change the user’s nickname in the URL
video_id = "7024119047490964737"
# Create the asynchronous process and pass the variable with the user’s nickname to the parsing function
video_info = await scrape_tiktok_video(video_id)
# If the data are not blank, generate the parsing result in the console
if video_info:
print("\nVideo info:")
# Process the scope by keys, just in case, and replace the symbols. Iterate through the array by keys, and just in case, replace underscores with spaces to make it look nice.
for key, value in video_info.items():
print(f"{key.replace('_', ' ').title()}: {value}")
# Store the data in the JSON file
with open(f"{video_id}_video.json", 'w', encoding='utf-8') as f:
json.dump(video_info, f, ensure_ascii=False, indent=4)
print(f"Video data are stored in the file {video_id}_video.json")
else:
print("Video data could not be parsed.")
if __name__ == "__main__":
hdlschrome.loop().run_until_complete(main())
Save this code in a file and run it in the console.
The JSON file should appear in the project folder with the same data.
Parsing All Videos in the TikTok Profile
Let's increase the level of complexity to the maximum!
We'll write a parser that accepts a TikTok profile username and parses all the videos found in it. Moreover, all the videos will be downloaded to the PC, and the data about them will be exported to a JSON file. If the target website employs protection mechanisms, such as displaying a CAPTCHA, the parser will pause and prompt the user to solve the CAPTCHA.
We'll introduce random delays between requests to make the program's behavior more "human-like."
To download videos, we'll use the YT_DLP library. For browser automation, we'll use Playwright as the driver. Here's an insightful comparison of Playwright with Puppeteer to consider.
So, let's start by installing the missing libraries:
pip install playwright yt_dlp
You should obligatorily install Headless Browser for Playwright. This is done with a command:
playwright install
The script itself is provided below. All the remarks about its use are available in the comments (after the # symbol).
# Import the libraries.
from playwright.async_api import async_playwright
import asyncio, random, json, logging, time, os, yt_dlp
# Set up log management
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
# Here is the flag to enable/disable the necessity to store (download) the video
DOWNLOAD_VIDEOS = False
# Determine the pause between asynchronous requests
async def random_sleep(min_seconds, max_seconds):
await asyncio.sleep(random.uniform(min_seconds, max_seconds))
# Scroll the profile page, with random intervals
async def scroll_page(page):
await page.evaluate("""
window.scrollBy(0, window.innerHeight);
""")
await random_sleep(3, 5)
# Wait and process the captcha display
async def handle_captcha(page):
try:
# If the captcha is not tracked, check the dialog definition logic - replace the div[role="dialog"] with the current locator
captcha_dialog = page.locator('div[role="dialog"]')
# If the captcha is found, wait for the window to be visible (it gets hidden after the solution)
is_captcha_present = await captcha_dialog.count() > 0 and await captcha_dialog.is_visible()
if is_captcha_present:
logging.info("Captcha detected. Solve it manually and close other popup windows on the website.")
# Wait for the user to solve the captcha.
await page.wait_for_selector('div[role="dialog"]', state='detached', timeout=300000) # The forced 5-minute timeout, the parameter is set in milliseconds.
logging.info("Capthca solved, keep parsing...")
await asyncio.sleep(3) # Just in case, there is one more delay, but after the captcha is solved, in milliseconds here.
# Error processing
except Exception as e:
logging.error(f"Error while solving the captcha: {str(e)}")
async def extract_video_info(page, video_url):
# Content display await settings first - the domcontentloaded parameter
await page.goto(video_url, wait_until="domcontentloaded")
# Random delay, from 2 to 4 seconds
await random_sleep(2, 4)
# Wait for captcha processing, if any
await handle_captcha(page)
# Collect the video-related data (the logic of parsing information about a particular video: likes, comments, reposts, etc., is defined here)
# This is the JavaScript inclusion here
video_info = await page.evaluate("""
() => {
const getTextContent = (selectors) => {
for (let selector of selectors) {
const elements = document.querySelectorAll(selector);
for (let element of elements) {
const text = element.textContent.trim();
if (text) return text;
}
}
return 'N/A';
};
const getTags = () => {
const tagElements = document.querySelectorAll('a[data-e2e="search-common-link"]');
return Array.from(tagElements).map(el => el.textContent.trim());
};
return {
likes: getTextContent(['[data-e2e="like-count"]', '[data-e2e="browse-like-count"]']),
comments: getTextContent(['[data-e2e="comment-count"]', '[data-e2e="browse-comment-count"]']),
shares: getTextContent(['[data-e2e="share-count"]']),
bookmarks: getTextContent(['[data-e2e="undefined-count"]']),
musicTitle: getTextContent(['.css-pvx3oa-DivMusicText']),
date: getTextContent(['span[data-e2e="browser-nickname"] span:last-child']),
tags: getTags()
};
}
""")
video_info['url'] = video_url
# Manage the logs
logging.info(f"Video processed {video_url}: {video_info}")
return video_info
# This block is responsible for the video load (storage), it is enabled by the flag above
def download_tiktok_video(video_url, save_path):
# Use the library for downloading and set up video quality
ydl_opts = {
'outtmpl': os.path.join(save_path, '%(id)s.%(ext)s'),
'format': 'best',
}
try:
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
info = ydl.extract_info(video_url, download=True)
filename = ydl.prepare_filename(info)
# Log the actions
logging.info(f"Video loaded successfully: {filename}")
return filename
# Error processing
except Exception as e:
logging.error(f"Video download error: {str(e)}")
return None
# TikTok profile parsing block
async def scrape_tiktok_profile(username, videos=[]):
# Launch the headless browser sample via Playwright
async with async_playwright() as p:
# Make the browser window visible and easy to work with for a user
browser = await p.chromium.launch(headless=False)
# Set up window and user agent parameters
context = await browser.new_context(
viewport={'width': 1280, 'height': 720},
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
)
page = await context.new_page()
# Pass for the profile URL download - to change it, you should replace the username variable (provided below)
url = f"https://www.tiktok.com/@{username}"
await page.goto(url, wait_until="domcontentloaded")
# Process the captcha
await handle_captcha(page)
# If the video scope is still blank
if not videos:
videos = []
last_video_count = 0
no_new_videos_count = 0
start_time = time.time()
timeout = 300 # Just in case, set the 5-minute timeout (300 seconds)
while True:
# Scroll the page and process the captcha
await scroll_page(page)
await handle_captcha(page)
# Search the blocks with videos using the data-e2e="user-post-item" attribute. If anything has changed, update it to the current value
video_elements = await page.query_selector_all('div[data-e2e="user-post-item"]')
# Collect the scope with links for separate videos
for element in video_elements:
# Extract the links here (href)
video_url = await element.evaluate('(el) => el.querySelector("a").href')
if any(video['url'] == video_url for video in videos):
continue
videos.append({'url': video_url})
# Log the actions
logging.info(f"Unique videos found: {len(videos)}")
# The counter for new videos is no longer increasing...
if len(videos) == last_video_count:
no_new_videos_count += 1
else:
no_new_videos_count = 0
last_video_count = len(videos)
if no_new_videos_count >= 3 or time.time() - start_time > timeout:
# The search process is complete
break
# Log the actions
logging.info(f"Number of videos found: {len(videos)}")
for i, video in enumerate(videos):
if 'likes' not in video:
video_info = await extract_video_info(page, video['url'])
videos[i].update(video_info)
logging.info(f"Process the video {i+1}/{len(videos)}: {video['url']}")
# If the download option is enabled, save the video locally
if DOWNLOAD_VIDEOS:
save_path = os.path.join(os.getcwd(), username)
filename = download_tiktok_video(video['url'], save_path)
if filename:
videos[i]['local_filename'] = filename
# Save the process every 10 seconds
if (i + 1) % 10 == 0:
with open(f"{username}_progress.json", "w") as f:
json.dump(videos, f, indent=2)
logging.info(f"Process saved, videos processed: {i+1}/{len(videos)}...")
# Random delay, from 3 to 5 seconds
await random_sleep(3, 5)
# Close the browser
await browser.close()
return videos
# Our main function
async def main():
#Here we define the profile, in which we’ll search the video
username = "khabylame.62"
# Try to load the process from the moment we stopped at
try:
with open(f"{username}_progress.json", "r") as f:
videos = json.load(f)
logging.info(f"Load the process {len([v for v in videos if 'likes' in v])}/{len(videos)} videos already processed.")
except FileNotFoundError:
videos = []
if DOWNLOAD_VIDEOS:
os.makedirs(username, exist_ok=True)
os.chdir(username)
videos = await scrape_tiktok_profile(username, videos)
logging.info(f"\n Videos parsed: {len(videos)}")
#Save the JSON data, the file will include the profile name.
with open(f"{username}_playwright_video_stats.json", "w", encoding='utf-8') as f:
json.dump(videos, f, indent=2, ensure_ascii=False)
print(f"Data saved to the file {username}_playwright_video_stats.json")
if __name__ == "__main__":
asyncio.run(main())
Save the code into a file with the corresponding name and run it in the console.
If you want the TikTok scraper to save videos, change the DOWNLOAD_VIDEOS flag from False to True.
Make sure to monitor the console output to address captchas promptly. If a different pop-up appears instead of a captcha, such as a request to continue browsing without logging in, simply close it, and the process will continue.
Conclusion and Recommendations
Even complex dynamic websites like TikTok can be parsed with minimal coding, thanks to pre-built libraries and headless browsers. Specifically, we demonstrated a fairly advanced TikTok scraper that takes a profile link as input, scans all videos of the selected user, and then processes each video to collect key metrics such as reposts, likes, comments, etc. It is even possible to download these videos.
If desired, the same script can be easily adapted to parse TikTok search results. Initially, it can generate a list of discovered videos and then scan the pages of specific videos.
You will quickly notice that TikTok actively defends against automated traffic by presenting a large number of captchas. For large-scale parsing, the costs associated with captcha recognition can rise significantly. It is much more convenient and efficient to make headless browser instances work through proxies. This approach can further speed up the process, as each parsing thread will connect via its own proxy.
You can find high-quality residential, mobile, and datacenter proxies with automatic rotation from us! For convenient service testing, there is an affordable trial package. Froxy offers over 10 million IPs with targeting up to the city level.