YouTube’s streaming platform serves an audience of roughly 2.5 billion users every month. These are independent statistics for 2025. Naturally, this makes the platform highly attractive for businesses, not only as a channel for promoting products and services, but also as a source of feedback and a way to identify current tendencies and trends. Many public figures and brands actively monitor their online reputation. In this context, YouTube comments are an excellent source of opinions and reviews.
This leads businesses to a logical technical question: how can YouTube comments be collected and analyzed? In this article, we will mainly focus on the first part, scraping YouTube comments, while also briefly touching on the topic of analysis.
There are only three main options: use the official YouTube API interface, which is free but has a daily quota; build your own YouTube comment extractor, possibly with the help of existing libraries; or use third-party cloud scraping tools with their own APIs, which usually remove quota limits but are always paid.
Let’s go through each option.
If you register an account in Google Cloud, you can connect to the YouTube Data API v3. Each new project gets 10,000 quota units per day. The quota resets every day at 00:00 Pacific Time. The current quota usage and limits can be viewed in your Google Cloud dashboard.
Keep in mind that different request types consume different amounts of quota. For example, a simple request to retrieve a list of comments for a YouTube channel or a specific video costs 1 API unit, while the response itself may contain anywhere from 1 to 100 comments. In other words, with 10,000 units, you can theoretically retrieve up to 1,000,000 comments. But publishing, updating, deleting, or changing moderation status already costs 50 units per request.
If the provided quota is not enough, you can request a limit increase, but this requires filling out a special form and then going through an audit process. This option is intended for business clients.
If you want to monitor quota consumption, you can configure notifications in the dashboard to alert you when certain usage thresholds are reached.
The advantages are obvious:
There are also clear downsides:
Technically, these are ready-made cloud scrapers, similar in principle to Froxy Scrapers, that can extract YouTube comments on request. They rely on their own technical solutions for interacting with YouTube.
Advantages of external APIs for YouTube scraping:
Disadvantages:
You can build a scraper of any scale and complexity, focused on specific mentions or keywords, with or without AI, with analytics, statistics, and much more. In the end, everything depends only on your imagination and programming skills.
Related topic: how to bypass YouTube blocking with proxies.
Advantages:
The disadvantages are serious:
For those who are not looking for the easy way, let’s go deeper into the details of manual YouTube scraping:
One helpful detail is that YouTube does not use unique CSS identifiers or heavily obfuscated styles to make DOM parsing more difficult. This means the syntactic parsing itself can be handled by Beautiful Soup in Python.
Proxies are the decisive element here. The more threads you can process in parallel, the faster you will be able to collect the data you need. They also make it possible to bypass blocks.
As you may have noticed, the most efficient strategy is to use the official free API, comment lists can be retrieved almost instantly and without any headless browsers.
That leaves only one real problem: quotas. Increasing them is difficult, but you can create several Google Cloud accounts in parallel and collect data across multiple streams, with the number of streams matching the number of accounts.
And this is where high-quality proxies become useful again, each account operating from its own IP address will appear fully unique and independent in Google’s eyes.
Below, we will show a version of a script that can call the YouTube API in multiple threads and collect comments with pagination taken into account, which is important for large result sets. The collection depth can be limited directly in the code.
import csv
import json
import queue
import threading
import time
from dataclasses import dataclass, field
from typing import List, Optional
import requests
# ============================================================
# CONFIG
# ============================================================
CONFIG = {
# number of threads
"threads": 2,
"quota_limit": 9990,
"max_comments_per_video": None,
# Request timeout
"request_timeout": 30,
# Sleep between requests
"sleep_between_requests": 0.1,
# Videos file
"videos_file": "videos.txt",
# CSV result
"output_csv": "youtube_comments.csv",
}
# ============================================================
# API / PROXY CONFIG
# ============================================================
# One proxy = one API key
#
# proxy:
# None -> without proxy
# http://user:pass@host:port
#
# ============================================================
API_ACCOUNTS = [
{
"api_key": "YOUTUBE_API_KEY_1",
"proxy": "http://login:password@127.0.0.1:8080",
},
{
"api_key": "YOUTUBE_API_KEY_2",
"proxy": "http://login:password@127.0.0.1:8081",
},
{
"api_key": "YOUTUBE_API_KEY_3",
"proxy": None,
},
]
# ============================================================
# DATA CLASSES
# ============================================================
@dataclass
class ApiAccount:
api_key: str
proxy: Optional[str]
requests_used: int = 0
lock: threading.Lock = field(default_factory=threading.Lock)
def is_available(self, quota_limit: int) -> bool:
return self.requests_used < quota_limit
def add_request(self):
with self.lock:
self.requests_used += 1
# ============================================================
# ACCOUNT MANAGER
# ============================================================
class AccountManager:
def __init__(self, accounts_data, quota_limit):
self.accounts: List[ApiAccount] = [
ApiAccount(
api_key=a["api_key"],
proxy=a["proxy"]
)
for a in accounts_data
]
self.quota_limit = quota_limit
self.index = 0
self.lock = threading.Lock()
def get_account(self) -> ApiAccount:
with self.lock:
while self.index < len(self.accounts):
account = self.accounts[self.index]
if account.is_available(self.quota_limit):
return account
print(
f"[INFO] API key exhausted: "
f"{account.api_key[:10]}..."
)
self.index += 1
raise RuntimeError("No API accounts left")
# ============================================================
# CSV WRITER
# ============================================================
class CsvWriter:
def __init__(self, filename):
self.filename = filename
self.lock = threading.Lock()
with open(self.filename, "w", newline="", encoding="utf-8-sig") as f:
writer = csv.writer(f)
writer.writerow([
"user",
"comment",
"published_at"
])
def write_comment(self, user, comment, published_at):
with self.lock:
with open(
self.filename,
"a",
newline="",
encoding="utf-8-sig"
) as f:
writer = csv.writer(f)
writer.writerow([
user,
comment,
published_at
])
# ============================================================
# YOUTUBE PARSER
# ============================================================
class YouTubeCommentsParser:
API_URL = (
"https://www.googleapis.com/youtube/v3/commentThreads"
)
def __init__(self, account_manager, csv_writer):
self.account_manager = account_manager
self.csv_writer = csv_writer
def fetch_comments(self, video_id):
total_comments = 0
next_page_token = None
while True:
account = self.account_manager.get_account()
params = {
"part": "snippet",
"videoId": video_id,
"maxResults": 100,
"textFormat": "plainText",
"key": account.api_key,
}
if next_page_token:
params["pageToken"] = next_page_token
proxies = None
if account.proxy:
proxies = {
"http": account.proxy,
"https": account.proxy,
}
try:
response = requests.get(
self.API_URL,
params=params,
proxies=proxies,
timeout=CONFIG["request_timeout"]
)
account.add_request()
if response.status_code != 200:
print(
f"[ERROR] Video={video_id} "
f"HTTP={response.status_code} "
f"Response={response.text}"
)
break
data = response.json()
items = data.get("items", [])
if not items:
break
for item in items:
snippet = (
item["snippet"]
["topLevelComment"]
["snippet"]
)
author = snippet.get(
"authorDisplayName",
""
)
text = snippet.get(
"textDisplay",
""
)
published_at = snippet.get(
"publishedAt",
""
)
self.csv_writer.write_comment(
author,
text,
published_at
)
total_comments += 1
limit = CONFIG["max_comments_per_video"]
if limit is not None:
if total_comments >= limit:
print(
f"[INFO] Limit reached "
f"for video {video_id}"
)
return
next_page_token = data.get("nextPageToken")
print(
f"[INFO] Video={video_id} "
f"Collected={total_comments} "
f"Requests={account.requests_used}"
)
if not next_page_token:
break
time.sleep(CONFIG["sleep_between_requests"])
except Exception as e:
print(
f"[ERROR] Video={video_id} "
f"Exception={e}"
)
break
print(
f"[DONE] Video={video_id} "
f"Total comments={total_comments}"
)
# ============================================================
# THREAD WORKER
# ============================================================
def worker(video_queue, parser):
while True:
try:
video_id = video_queue.get_nowait()
except queue.Empty:
return
try:
parser.fetch_comments(video_id)
finally:
video_queue.task_done()
# ============================================================
# LOAD VIDEOS
# ============================================================
def load_videos(filename):
videos = []
with open(filename, "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line:
continue
videos.append(line)
return videos
# ============================================================
# MAIN
# ============================================================
def main():
videos = load_videos(CONFIG["videos_file"])
print(f"[INFO] Loaded videos: {len(videos)}")
video_queue = queue.Queue()
for video_id in videos:
video_queue.put(video_id)
account_manager = AccountManager(
API_ACCOUNTS,
CONFIG["quota_limit"]
)
csv_writer = CsvWriter(
CONFIG["output_csv"]
)
parser = YouTubeCommentsParser(
account_manager,
csv_writer
)
threads = []
for i in range(CONFIG["threads"]):
t = threading.Thread(
target=worker,
args=(video_queue, parser),
daemon=True
)
t.start()
threads.append(t)
video_queue.join()
for t in threads:
t.join(timeout=1)
print("\n[FINISHED]")
print("API usage statistics:")
for idx, account in enumerate(
account_manager.accounts,
start=1
):
print(
f"{idx}. "
f"Requests={account.requests_used} "
f"Proxy={account.proxy}"
)
if __name__ == "__main__":
main()
Do not forget to add the list of videos to the videos.txt file. Only the video IDs are needed there, for example:
dQw4w9WgXcQ
aqz-KE-bpKQ
M7lc1UVf-VE
How the script works:
If replies or nested comments are essential for your task, then instead of the official API you will need to rely only on a custom scraper. But as mentioned earlier, this comes with significant complexity, headless browsers are unavoidable. You will also have to locate the buttons that load replies and expand all comment threads before collecting the data.
As you can guess, comments extracted by a scraper do not carry any meaning on their own. They are simply rows in a table or entries in a database.
There are different ways to analyze them. The fastest and simplest is surface-level analysis based on statistics: how many comments were posted under a video, how they were distributed over time, when the biggest spike occurred, and, ideally, whether that spike can be linked to other events, for example, if you shared the video in your blog. You can also measure their average length, identify which users comment on the selected videos most often, and much more.
All of this can be easily calculated and displayed in the form of charts and graphs.
But there is another type of analysis as well — semantic analysis. For example, it can be used to detect negativity when the subject is a brand, or to identify recurring question themes. Perhaps some viewers could not figure out how to use your product, or something in the video explanation was unclear.
At that point, only AI-based automation can really help. You need to run the comments through the APIs of specialized AI assistants in order to generate summaries for toxicity and sentiment analysis, intent categorization, and topic clustering.
Everything here is highly individual and depends to a large extent on how well you formulate your prompts.
Perfect proxies for accessing valuable data from around the world.
There are several ways to build your own YouTube comment scraper: use the official YouTube API, rely on third-party services, or create a full-scale parser that supports headless browsers and automated user actions, which is unavoidable in that case.
If you need to process a large volume of data as quickly as possible and without unnecessary complications, the best choice is the official API. But if you want to get around limitations on the number of threads and the number of requests to YouTube, proxies are the most practical solution.
For high-quality proxies for YouTube and any other internet services, there is Froxy. It offers a huge pool of mobile, residential, and datacenter IP addresses with precise targeting and automatic rotation.