The IMDb website (short for Internet Movie Database) is the most popular and freely contributed database about movies, TV shows, actors, directors, screenwriters, etc. Here, one can explore current viewer ratings, review statistics, box office earnings, awards, reviews, descriptions, release schedules, episode titles and lists, ratings, categories, genres,, and other relevant data. Notably, IMDb offers a paid Pro subscription for actors and celebrities, which allows for enhanced profile management and building professional connections.
The platform has existed since 1990 and contains records on over 14 million actors and 10 million works. IMDb has an API interface, but it is very expensive, starting at $150,000 per year. Thus, investing in writing your IMDb scraper is much more interesting.
This material explains how to scrape data from IMDb without getting blocked.
Understanding IMDb’s Structure and Markup
IMDb data is most commonly used on websites dedicated to movies and TV shows, such as fan sites, torrent trackers, streaming platforms, etc. It can also be applied in mobile and TV apps. IMDb data is essential for organizing descriptive content (about a movie, a TV show, or a specific episode) or for displaying up-to-date ratings, reviews (the legendary IMDb star rating from 1 to 10), statistics, etc.
For fan sites, IMDb provides a ready-made "plugin." However, it's not a plugin; it's a web service that provides a ready-made HTML code to embed on external websites upon entering the title of a work. The widget will display the current viewer rating for a particular movie.
It’s not that simple, however. Otherwise, this material wouldn't exist.
How IMDb Organizes Its Data
When it comes to the API, access to current data is available through AWS Data Exchange and in JSON format only. Pre-made datasets are also available, but they can only be accessed via Amazon Athena (a syntax similar to SQL table queries).
This would be convenient, not considering high access expenses. Unsurprisingly, enthusiasts have created their own open database – OMDb (Open Movie Database). You can get up to 1000 API requests per day for free here. This is the open database, of course, not affiliated with IMDb.
The most reliable method is to manually scrape the required data and create your own database, formatted as you need it without conversion or intermediary services.
The IMDb’s website is divided into the following sections: TOP 250 movies, release calendar, most popular videos, best box office earnings, news, TV shows (many sections mirror movie content structure), awards and events (Oscar, best of the year, STARmeter awards, etc.), celebrities/stars (most popular, profile pages with descriptions, achievements, and awards, lists of created or featured content, news, etc.), trailers, what to watch, polls and some other sections.
The most informative pages are those dedicated to specific movies or TV shows, as well as actor and celebrity pages (directors, screenwriters, TV show hosts).
Naturally, each page has its unique HTML markup and set of displayed data.
For example, on movie pages, you can find all the key IMDb data:
- Title (original title in the language of the country of origin and the translated title for the viewing country, displayed on the basis of the user's current connection region);
- Ratings (overall star rating, your rating, if any, critics' ratings, and separate user ratings);
- Statistics (to understand popularity and demand over recent times);
- Duration;
- Release date;
- Age ratings and certifications;
- Posters, trailers, screenshots, and other related media content;
- Genres and tags (categories and taxonomies the work belongs to);
- List of actors, screenwriters, director,s and other crew members;
- Awards and nominations received;
- Lists of similar movies (based on a combination of factors; the algorithm is not disclosed);
- Plot description;
- Fun facts;
- Soundtrack information;
- User reviews;
- And many other data points (links to official websites, alternative titles, filming locations, production company, budget, earnings, technical video parameters, related content such as sequels or episodes, etc.).
Residential Proxies
Perfect proxies for accessing valuable data from around the world.
Key URLs and Parameters for Navigating IMDb’s Site Structure
The most important URLs for the IMDb parser:
- https://www.imdb.com/title/tt0068646/
The numerical identifier at the end corresponds to the specific movie. The sample link opens "The Godfather."
By iterating through these identifiers, you can parse every single film in the IMDb database (there are over 10 million titles there).
Additional parameters can be used to get more specific information:
- …/movieconnections/ – redirects to a page with related content. In the case of "The Godfather," you’ll find a list of all subsequent sequels, series, and releases.
- …/ratings/ – redirects to a page detailing how the film's star rating is formed (a visual representation of the distribution of user scores).
- …/videogallery/ – leads to a page with video content (trailers, cutscenes, etc.).
- …/mediaviewer/ – provides access to the image library (screenshots, photographs, etc.).
- …/fullcredits/ – a dedicated page listing cast and crew members in detail.
- …/plotsummary/ – a detailed text description of the film (synopsis and summary).
- …/reviews/ – a page with user reviews. You can sort reviews using URL parameters. For example, …/reviews/?rating=10&sort=submission_date%2Cdesc will show reviews with a rating of 10 sorted by submission date in descending order (from newest to oldest).
- …/awards/ – a page listing awards.
Celebrity pages look as follows:
- https://www.imdb.com/name/nm0000246/
The given address leads to Bruce Willis’s page.
As you might guess, iterating through the identifier at the end leads to pages for different celebrities.
These pages also have modifiers: …/awards/), …/bio/ (biography), …/trivia/ (fun facts), …/videogallery/ (video gallery) etc.
But the most interesting part is how to interact with the built-in search system.
Here is an example of a search query described within the URL structure:
https://www.imdb.com/search/title/?title=Godfather&title_type=feature&release_date=1970-01-01,1980-12-31&user_rating=1
Here, we are searching for a movie (“feature” type) with the title "Godfather," released from January 1970 to December 1980 («release_date=1970-01-01,1980-12-31»), with a user rating of 1 or higher.
And here is how to search for specific actors (looking for Al Pacino):
https://www.imdb.com/search/name/?name=Al%20Pacino
You can even find TV programs, movies, and other content related to two celebrities at once:
https://www.imdb.com/search/title/?role=nm0000246,nm0000199
The system will return results featuring both Al Pacino and Bruce Willis (the search query includes their IDs).
IMDb’s Terms of Service and Rules Regarding Web Scraping
IMDb data is protected by copyright. Even if users post their reviews or other content here, they automatically transfer their rights to IMDb.
Parsing IMDb is explicitly prohibited by the service’s terms of use. You can obtain permission to parse IMDb in written form only from representatives of the Licensing Department. In most cases, these rules apply to the United States since the company and the website fall under US jurisdiction.
However, detecting the act of parsing and proving a violation of licensing terms is practically impossible if you do not use direct IMDb attributes: logos, brand names (trademarks), etc.
Movie titles, TV show titles, and actor lists cannot technically belong to IMDb. Therefore, you can safely parse a site and use the data for your own purposes. For example, you can find the highest-rated movies according to specific criteria or display a movie’s rating (without directly referencing IMDb).
P.S.: The plugin’s terms of use (for the rating icon) state that it can only be placed on personal websites and blogs. Video and streaming services, online cinemas, recommendation sites and other commercial services must request an official licensing agreement since the plugin contains the recognizable MDb’s trademark.
Professional Support
Our dedicated team and 24/7 expert support are here to keep you online and unstoppable.
Building IMDb Scraper
Most ready-made IMDb parsers are written in Python. Here are a few examples if you’re eager to start right away (without writing your own code): JohnDoee/imdbparser, python-automation-scripts/imdb-scraper/ (part of a set of automation scripts), PyMovieDb, IMDB-Scraper etc.
Additionally, you can always use general-purpose parsers like the Scrapy framework.
To raise the level of responsibility, we’ll use the Golang programming language. While there are also ready-made IMDb parsers for it, they are fewer in number: GMDB, imdb-scraping-and-analysing, IMDB-Scraper-Golang.
Current Issues of IMDb Parsing
Note! Almost all ready-made solutions for parsing data from IMDb have recently stopped working!
This is because, ever since IMDb introduced paid API access, the site’s administration has done a good job in protecting the data on their pages:
- Frontend Dynamism: The frontend of the site is now fully dynamic (consisting solely of JavaScript code). This means that to access the resulting HTML, you will need a headless browser or an anti-detect approach.
- Dynamic Class Naming: All CSS class names are intentionally made unique using special variables. As a result, on every new page, the same element - such as the block that displays the final rating - will have a new class name and ID. Finding repeating patterns in the HTML structure is impossible. No XPath will help you.
Under these conditions, you might try connecting to NodeID (data-testid) only. However, there are pitfalls here as well.
You’ll eventually need to write your own IMDb parser from scratch - with a complex element-searching system.
To simplify our demonstration, we’ve decided to use Golang in conjunction with the chromedp library. You can also review other libraries for Golang.
Setting Up Your Scraping Environment
Download and install the Golang runtime environment for your platform from the official download page. If needed, you can also install and configure a Git client (if it’s not already installed on your system).
Next, you need to create a directory where your project files will be stored. Let it be the C drive in a folder named “My-IMDb-parser.” You can create the folder manually or via the terminal:
mkdir \My-IMDb-parser
Then navigate to your directory:
cd \My-IMDb-parser
Create your project with an IMDb data parser (run this command in the terminal):
go mod init My-IMDb-parser
To access the resulting HTML code, we need a headless browser. We’ll use the already installed Google Chrome through the chromedp library (read the related material):
go get -u github.com/chromedp/chromedp
Wait until the module and all the dependent libraries are downloaded and installed (they should be fetched automatically).
You’re now ready to start programming your first IMDb parser.
Worldwide Coverage
6 continents, No limits
Access our proxy network with more than 200 locations and over 10 million IP addresses.
Writing the Code
First, create the project's starting file. Let’s call it "My-IMDb-parser.go".
To do this, open your project directory and create a simple text file with the desired name, then change its extension to “.go”.
Open the file in a text editor and add the following code:
package main
// Import the Modules
import (
"context"
"fmt"
"github.com/chromedp/cdproto/cdp"
"github.com/chromedp/chromedp"
"log"
)
// Our data structure for stars/celebrities we’ll use to store the data after extraction (For now, we only store the name variable)
type Star struct {
name string
}
// create the main function of your program
func main() {
// to track all the data, create the stars variable and the Star [] slice
var stars []Star
// Initiate the headless-chrome sample
ctx, cancel := chromedp.NewContext(
context.Background(),
)
// close it, if not used
defer cancel()
// Browser automation logic
var starNodes []*cdp.Node
err := chromedp.Run(ctx,
chromedp.Navigate("https://www.imdb.com/search/name/?birth_monthday=03-07"), //target page
// The birthday month and day of a celebrity are passed directly in the URL.
// you can replace them with your own data in the MM-DD format
chromedp.Nodes(".ipc-title-link-wrapper", &starNodes, chromedp.ByQueryAll), //find all star name cards in the search results, they have the ipc-title-link-wrapper class
)
//check for errors
if err != nil {
log.Fatal("Error:", err)
}
// parser logic
var name string
for _, node := range starNodes {
//extract data from the HTML search card
err = chromedp.Run(ctx,
// gather text from the H3 tag, our title
chromedp.Text("h3", &name, chromedp.ByQuery, chromedp.FromNode(node)),
)
//error output, if any
if err != nil {
log.Fatal("Error:", err)
}
// launch the new process for parsing data structuring
star := Star{}
star.name = name
stars = append(stars, star)
}
//Print out the list of celebrities born on 03-07
fmt.Println(stars)
}
Running the IMDb Parser
Save the file and run it from the console:
cd \My-IMDb-parser
go run My-IMDb-parser.go
After it finishes, the console will display a list of celebrities born on March 7 (03-07).
If you want, you can change the search parameters. In our case, just change the date directly in the code.
Conclusion and Recommendations
If you need more complex functionality like parsing the movie’s rating (user rating) and other content, it will take more effort. The main issue is that the site is well-protected: you must use a headless browser and work around protection tools like randomized CSS classes.
Even if you manage to solve these issues, parsing data from IMDb won’t be easy, as the database contains more than 10 million movies alone.
To significantly speed up the scraping process, you need to use rotating proxies. By the way, they integrate easily with the chromedp library we used in our example as well as with any other software or libraries.
You can find high-quality proxies with automatic rotation by time intervals or with each new request from us. Froxy provides 10+ million clean IP addresses, both mobile and residential. You pay only for the traffic, while proxies are selected with city- and ISP-level precision.