Web Scraping

How to Scrape YouTube Comments for Insights

Three proven ways to scrape YouTube comments: the official API, paid tools, or custom Python scrapers, plus a sample script and tips on analyzing data.

Team Froxy 11 Jun 2026 7 min read

How to Scrape YouTube Comments for Insights

YouTube’s streaming platform serves an audience of roughly 2.5 billion users every month. These are independent statistics for 2025. Naturally, this makes the platform highly attractive for businesses, not only as a channel for promoting products and services, but also as a source of feedback and a way to identify current tendencies and trends. Many public figures and brands actively monitor their online reputation. In this context, YouTube comments are an excellent source of opinions and reviews.

This leads businesses to a logical technical question: how can YouTube comments be collected and analyzed? In this article, we will mainly focus on the first part, scraping YouTube comments, while also briefly touching on the topic of analysis.

The Methods: How to Access the Data

There are only three main options: use the official YouTube API interface, which is free but has a daily quota; build your own YouTube comment extractor, possibly with the help of existing libraries; or use third-party cloud scraping tools with their own APIs, which usually remove quota limits but are always paid.

Let’s go through each option.

1. YouTube Data API

If you register an account in Google Cloud, you can connect to the YouTube Data API v3. Each new project gets 10,000 quota units per day. The quota resets every day at 00:00 Pacific Time. The current quota usage and limits can be viewed in your Google Cloud dashboard.

Keep in mind that different request types consume different amounts of quota. For example, a simple request to retrieve a list of comments for a YouTube channel or a specific video costs 1 API unit, while the response itself may contain anywhere from 1 to 100 comments. In other words, with 10,000 units, you can theoretically retrieve up to 1,000,000 comments. But publishing, updating, deleting, or changing moderation status already costs 50 units per request.

If the provided quota is not enough, you can request a limit increase, but this requires filling out a special form and then going through an audit process. This option is intended for business clients.

If you want to monitor quota consumption, you can configure notifications in the dashboard to alert you when certain usage thresholds are reached.

The advantages are obvious:

The API is completely free.
No proxies, captcha solving, WAF bypassing, or similar workarounds are needed.
There is detailed documentation and code examples for each API method, including methods for retrieving comment lists from a YouTube channel, whether your own or someone else’s.

There are also clear downsides:

Authorization and authentication through an API key are required.
The quota limit may be too small for large businesses, especially when huge volumes of comments must be processed every day.
YouTube may return only top-level comments, without replies and thread structure.
Deleted comments and shadow-banned entries may be missing from the output.

2. Third-Party Paid APIs

Technically, these are ready-made cloud scrapers, similar in principle to Froxy Scrapers, that can extract YouTube comments on request. They rely on their own technical solutions for interacting with YouTube.

Advantages of external APIs for YouTube scraping:

Ready-made infrastructure. You do not need to invent or build anything yourself, and the data is usually returned in a convenient format, sometimes already structured.
No quota limits.
No need for rotating proxies or complex anti-bot bypass logic.

Disadvantages:

These services are not free. API requests are usually sold in prepaid packages.
Each service has its own syntax, with its own parameters, methods, response formats, and quirks.
You still need a processing script that sends requests, handles responses, and saves the data, whether to a database or to table files.

3. A Custom YouTube Comment Scraper

You can build a scraper of any scale and complexity, focused on specific mentions or keywords, with or without AI, with analytics, statistics, and much more. In the end, everything depends only on your imagination and programming skills.

Advantages:

No tokens and no quotas. You work with direct access to page data.
Any data collection rules you want, including the full comment tree.
Any scale and any level of complexity.

The disadvantages are serious:

Programming skills are required.
You need the appropriate infrastructure and investment: proxies, headless browsers, databases, AI agents for summary analysis, and so on.
There is always a risk of access blocks, since YouTube actively protects itself from automated traffic.
There is also a risk of markup changes. Even a small redesign of comment page structure may force you to rewrite the scraping logic.

The Technical Roadblocks: Why Scraping Fails

Why Scraping Fails

For those who are not looking for the easy way, let’s go deeper into the details of manual YouTube scraping:

The service is built with a huge amount of JavaScript code. This means that you cannot realistically manage without headless browsers or anti-detect browsers. You need an engine capable of handling dynamic content. No standard HTTP request library can render JavaScript. And even a single open browser tab already consumes a significant amount of RAM and computing resources.
YouTube actively protects itself from bots, so bypassing blocks requires high-quality proxies, ideally residential or mobile proxies with automatic rotation and precise targeting.
The scraping script must actively interact with the page, because comments are not displayed all at once. They are loaded in portions, after button clicks and during scrolling. Scrolling the page gives access to top-level comments, while clicking buttons reveals reply threads.
Loading replies is also non-linear. Some comments may remain hidden behind an "Other replies" button even after the first replies have already been loaded through the "N replies" button.

One helpful detail is that YouTube does not use unique CSS identifiers or heavily obfuscated styles to make DOM parsing more difficult. This means the syntactic parsing itself can be handled by Beautiful Soup in Python.

Proxies are the decisive element here. The more threads you can process in parallel, the faster you will be able to collect the data you need. They also make it possible to bypass blocks.

Proxies in YouTube Scraping

As you may have noticed, the most efficient strategy is to use the official free API, comment lists can be retrieved almost instantly and without any headless browsers.

That leaves only one real problem: quotas. Increasing them is difficult, but you can create several Google Cloud accounts in parallel and collect data across multiple streams, with the number of streams matching the number of accounts.

And this is where high-quality proxies become useful again, each account operating from its own IP address will appear fully unique and independent in Google’s eyes.

Step-by-Step: From Python Script to Insights

Step-by-Step From Python Script to Insights

Below, we will show a version of a script that can call the YouTube API in multiple threads and collect comments with pagination taken into account, which is important for large result sets. The collection depth can be limited directly in the code.

import csv
import json
import queue
import threading
import time
from dataclasses import dataclass, field
from typing import List, Optional

import requests


# ============================================================
# CONFIG
# ============================================================

CONFIG = {
    # number of threads
    "threads": 2,

    "quota_limit": 9990,

    "max_comments_per_video": None,

    # Request timeout
    "request_timeout": 30,

    # Sleep between requests
    "sleep_between_requests": 0.1,

    # Videos file
    "videos_file": "videos.txt",

    # CSV result
    "output_csv": "youtube_comments.csv",
}


# ============================================================
# API / PROXY CONFIG
# ============================================================
# One proxy = one API key
#
# proxy:
#   None -> without proxy
#   http://user:pass@host:port
#
# ============================================================

API_ACCOUNTS = [
    {
        "api_key": "YOUTUBE_API_KEY_1",
        "proxy": "http://login:password@127.0.0.1:8080",
    },
    {
        "api_key": "YOUTUBE_API_KEY_2",
        "proxy": "http://login:password@127.0.0.1:8081",
    },
    {
        "api_key": "YOUTUBE_API_KEY_3",
        "proxy": None,
    },
]


# ============================================================
# DATA CLASSES
# ============================================================

@dataclass
class ApiAccount:
    api_key: str
    proxy: Optional[str]
    requests_used: int = 0
    lock: threading.Lock = field(default_factory=threading.Lock)

    def is_available(self, quota_limit: int) -> bool:
        return self.requests_used < quota_limit

    def add_request(self):
        with self.lock:
            self.requests_used += 1


# ============================================================
# ACCOUNT MANAGER
# ============================================================

class AccountManager:
    def __init__(self, accounts_data, quota_limit):
        self.accounts: List[ApiAccount] = [
            ApiAccount(
                api_key=a["api_key"],
                proxy=a["proxy"]
            )
            for a in accounts_data
        ]

        self.quota_limit = quota_limit
        self.index = 0
        self.lock = threading.Lock()

    def get_account(self) -> ApiAccount:
        with self.lock:
            while self.index < len(self.accounts):
                account = self.accounts[self.index]

                if account.is_available(self.quota_limit):
                    return account

                print(
                    f"[INFO] API key exhausted: "
                    f"{account.api_key[:10]}..."
                )

                self.index += 1

            raise RuntimeError("No API accounts left")


# ============================================================
# CSV WRITER
# ============================================================

class CsvWriter:
    def __init__(self, filename):
        self.filename = filename
        self.lock = threading.Lock()

        with open(self.filename, "w", newline="", encoding="utf-8-sig") as f:
            writer = csv.writer(f)
            writer.writerow([
                "user",
                "comment",
                "published_at"
            ])

    def write_comment(self, user, comment, published_at):
        with self.lock:
            with open(
                self.filename,
                "a",
                newline="",
                encoding="utf-8-sig"
            ) as f:
                writer = csv.writer(f)

                writer.writerow([
                    user,
                    comment,
                    published_at
                ])


# ============================================================
# YOUTUBE PARSER
# ============================================================

class YouTubeCommentsParser:
    API_URL = (
        "https://www.googleapis.com/youtube/v3/commentThreads"
    )

    def __init__(self, account_manager, csv_writer):
        self.account_manager = account_manager
        self.csv_writer = csv_writer

    def fetch_comments(self, video_id):
        total_comments = 0
        next_page_token = None

        while True:
            account = self.account_manager.get_account()

            params = {
                "part": "snippet",
                "videoId": video_id,
                "maxResults": 100,
                "textFormat": "plainText",
                "key": account.api_key,
            }

            if next_page_token:
                params["pageToken"] = next_page_token

            proxies = None

            if account.proxy:
                proxies = {
                    "http": account.proxy,
                    "https": account.proxy,
                }

            try:
                response = requests.get(
                    self.API_URL,
                    params=params,
                    proxies=proxies,
                    timeout=CONFIG["request_timeout"]
                )

                account.add_request()

                if response.status_code != 200:
                    print(
                        f"[ERROR] Video={video_id} "
                        f"HTTP={response.status_code} "
                        f"Response={response.text}"
                    )
                    break

                data = response.json()

                items = data.get("items", [])

                if not items:
                    break

                for item in items:
                    snippet = (
                        item["snippet"]
                        ["topLevelComment"]
                        ["snippet"]
                    )

                    author = snippet.get(
                        "authorDisplayName",
                        ""
                    )

                    text = snippet.get(
                        "textDisplay",
                        ""
                    )

                    published_at = snippet.get(
                        "publishedAt",
                        ""
                    )

                    self.csv_writer.write_comment(
                        author,
                        text,
                        published_at
                    )

                    total_comments += 1

                    limit = CONFIG["max_comments_per_video"]

                    if limit is not None:
                        if total_comments >= limit:
                            print(
                                f"[INFO] Limit reached "
                                f"for video {video_id}"
                            )
                            return

                next_page_token = data.get("nextPageToken")

                print(
                    f"[INFO] Video={video_id} "
                    f"Collected={total_comments} "
                    f"Requests={account.requests_used}"
                )

                if not next_page_token:
                    break

                time.sleep(CONFIG["sleep_between_requests"])

            except Exception as e:
                print(
                    f"[ERROR] Video={video_id} "
                    f"Exception={e}"
                )
                break

        print(
            f"[DONE] Video={video_id} "
            f"Total comments={total_comments}"
        )


# ============================================================
# THREAD WORKER
# ============================================================

def worker(video_queue, parser):
    while True:
        try:
            video_id = video_queue.get_nowait()
        except queue.Empty:
            return

        try:
            parser.fetch_comments(video_id)
        finally:
            video_queue.task_done()


# ============================================================
# LOAD VIDEOS
# ============================================================

def load_videos(filename):
    videos = []

    with open(filename, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()

            if not line:
                continue

            videos.append(line)

    return videos


# ============================================================
# MAIN
# ============================================================

def main():
    videos = load_videos(CONFIG["videos_file"])

    print(f"[INFO] Loaded videos: {len(videos)}")

    video_queue = queue.Queue()

    for video_id in videos:
        video_queue.put(video_id)

    account_manager = AccountManager(
        API_ACCOUNTS,
        CONFIG["quota_limit"]
    )

    csv_writer = CsvWriter(
        CONFIG["output_csv"]
    )

    parser = YouTubeCommentsParser(
        account_manager,
        csv_writer
    )

    threads = []

    for i in range(CONFIG["threads"]):
        t = threading.Thread(
            target=worker,
            args=(video_queue, parser),
            daemon=True
        )

        t.start()
        threads.append(t)

    video_queue.join()

    for t in threads:
        t.join(timeout=1)

    print("\n[FINISHED]")
    print("API usage statistics:")

    for idx, account in enumerate(
        account_manager.accounts,
        start=1
    ):
        print(
            f"{idx}. "
            f"Requests={account.requests_used} "
            f"Proxy={account.proxy}"
        )


if __name__ == "__main__":
    main()

Do not forget to add the list of videos to the videos.txt file. Only the video IDs are needed there, for example:

dQw4w9WgXcQ
aqz-KE-bpKQ
M7lc1UVf-VE

How the script works:

It loads the list of videos from the file.
It creates a task queue.
It starts the specified number of parallel threads.
Each thread takes a video from the list, retrieves its comments through the YouTube Data API v3, gets up to 100 records per request, uses pagination (nextPageToken) for reliability, saves the comments to a CSV file, and counts the number of requests made by each account.
When an API key gets close to its limit (9990 out of 10000 requests), the script automatically switches to the next proxy + key pair.

If replies or nested comments are essential for your task, then instead of the official API you will need to rely only on a custom scraper. But as mentioned earlier, this comes with significant complexity, headless browsers are unavoidable. You will also have to locate the buttons that load replies and expand all comment threads before collecting the data.

Turning Raw Text into Actionable Insights

As you can guess, comments extracted by a scraper do not carry any meaning on their own. They are simply rows in a table or entries in a database.

There are different ways to analyze them. The fastest and simplest is surface-level analysis based on statistics: how many comments were posted under a video, how they were distributed over time, when the biggest spike occurred, and, ideally, whether that spike can be linked to other events, for example, if you shared the video in your blog. You can also measure their average length, identify which users comment on the selected videos most often, and much more.

All of this can be easily calculated and displayed in the form of charts and graphs.

But there is another type of analysis as well — semantic analysis. For example, it can be used to detect negativity when the subject is a brand, or to identify recurring question themes. Perhaps some viewers could not figure out how to use your product, or something in the video explanation was unclear.

At that point, only AI-based automation can really help. You need to run the comments through the APIs of specialized AI assistants in order to generate summaries for toxicity and sentiment analysis, intent categorization, and topic clustering.

Everything here is highly individual and depends to a large extent on how well you formulate your prompts.

Residential Proxies

Perfect proxies for accessing valuable data from around the world.

Try With Trial $1.99, 100Mb

Conclusion: Scaling Your Intelligence

There are several ways to build your own YouTube comment scraper: use the official YouTube API, rely on third-party services, or create a full-scale parser that supports headless browsers and automated user actions, which is unavoidable in that case.

If you need to process a large volume of data as quickly as possible and without unnecessary complications, the best choice is the official API. But if you want to get around limitations on the number of threads and the number of requests to YouTube, proxies are the most practical solution.

For high-quality proxies for YouTube and any other internet services, there is Froxy. It offers a huge pool of mobile, residential, and datacenter IP addresses with precise targeting and automatic rotation.

How to Scrape YouTube Comments for Insights