Sign In Sign Up

Proxies

How to Use Wget With Proxy

Wget is a free non-interactive console program for downloading files over a network. Learn more about Wget and how to set up Wget with a proxy server.

Team Froxy 11 Apr 2024 6 min read
How to Use Wget With Proxy

Can you "download the internet"? Surely, that's impossible, you simply won't have enough storage. However, developers of free and open-source software always have a keen sense of humor. A simple example is the wget utility. Its name is the abbreviation of "www get," where WWW stands for World Wide Web. Thus, the term can be understood as "download the Internet."

In this material, however, we will focus not on the utility itself but on the ways to make it work through proxy. Usually, this is required for organizing multi-threaded connections and parsing operations.

Earlier, we have already talked about a similar utility - cURL (it is quite compatible with proxy granted that you have enough skills). Therefore, we will additionally compare both utilities and talk about their differences below.

What Is Wget and How to Use It

What Is Wget and How to Use It

Wget - is a built-in command-line utility that is provided with practically all popular Linux distributions; it is developed for fast downloading of files and other content via various internet protocols.

If needed, the utility can be installed and used on other platforms, as the program has open-source code that can be compiled for different execution environments.

Wget boasts a very simple syntax and is therefore ideal for everyday use, including for beginners. The fact that wget is included in the basic environment of Linux distributions allows downloading other progarms and packages quite quickly and easily. Tasks can be included in the cron scheduler as well (scripts and commands are executed on a schedule). Plus, wget can be incorporated into any other scripts and console commands.

For example, wget can be used to fully download a target website, if the options for bypassing URL addresses (with recursion) are set correctly.

Wget supports working with HTTP, HTTPS, FTP and FTPS protocols (+ some other, less popular ones).

A more correct name is GNU Wget (official website and documentation).

Note that there is a parallel implementation of wget - wget2. It has a number of small innovations and features.

An example of using wget to download an archive:

  • wget https://your.site/directory/archive.zip

Bulk files can be downloaded here by simply specifying all their names (links) separated by spaces:

  • wget https://your.site/directory/archive1.zip https://your.site/directory/archive2.zip https://your.site/directory/archive3.zip

The utility will download files sequentially with progress displayed directly in the console.

The names of target files (list of URLs) can be saved in a separate document and "fed" to wget like this:

  • wget --input-file=~/urls.txt

The same is about shortened options:

  • wget -i ~/urls.txt

If access is protected by a login and password, wget can handle it as well (you need to replace user and password with actual ones):

  • wget ftp://user:password@host/path

This is how you can create a local version of a specific website (it will be downloaded as HTML pages with all related content):

  • wget --mirror -p --convert-links -P /home/user/site111 source-site.com

You can download only files of a certain type from a website:

  • wget -r -A "*.png" domain.zone

Note! Wget cannot handle JavaScript, meaning it will only load and save custom HTML code. All dynamically loaded elements will be ignored.

wget applications

There are plenty of possible wget applications.

A complete list of all options and keys for the utility can be found in the program documentation as well as on the official website. In particular, you can:

  • Limit download speed and set other quotas;
  • Change the user-agent to your own value (for example, you can pretend to be a Chrome browser to the website);
  • Resume download;
  • Set offset when reading a file;
  • Analyze creation/modification time, MIME type;
  • Use constant and random delays between requests;
  • Recursively traverse specified directories and subdirectories;
  • Use compression at the wget proxy server level;
  • Switch to the background mode;
  • Employ proxies.

Naturally, we are mostly interested in the first point.

When parsing, wget can help with saving HTML content, which can later be dissected and analyzed by other tools and scripts. For more details, see materials on Python web scraping libraries and Golang Scraper.

Why Use a Proxy with Wget

Why Use a Proxy with Wget

A proxy is an intermediary server. Its main task is to organize an alternative route for exchanging requests between a client and a server.

Proxies can use different connection schemes and technologies. For example, proxies can be anonymous or not, work based on different types of devices (server-based, mobile, residential), paid or free, with feedback mechanisms (backconnect proxies), static or dynamic addresses etc.

No matter what they are, their tasks remain roughly the same: redirection, location change, content modification (compression, cleaning etc.).

When parsing, wget use proxy is also needed in order to hide the real owner's address and organize multiple parallel connections, for example, to speed up the data collection procedure (scraping, not to be confused with web crawling).

How to Install Wget

How to Install Wget

In many Linux distributions, wget is a pre-installed utility. If the wget command returns an error, wget can be easily installed using the native package manager.

Debian-based distributions, including Ubuntu:

  • sudo apt-get install wget

Fedora, CentOS and RHEL:

  • yum install wget

ArchLinux and equivalents:

  • pacman -Sy wget

In MacOS, wget can be installed either from the source (with “make” and “make install” commands) or using the Homebrew package manager. For beginners, the latter option will be the most convenient (note that cURL utility is used, which is pre-installed in MacOS by default):

  • /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  • brew install wget

In the latest versions of Windows (10 and 11), wget can be installed in the Linux subsystem (WSL), directly from compiled sources (for example, they can be found here) or using third-party package managers like Chocolatey. Installation command for Chocolatey:

  • choco install wget

If you install wget in Windows at the binary file level, you will need to specify the program link in the PATH variable for the correct applet invocation in the command line. Otherwise, you will have to refer to the file directly each time as ".\directory\wget.exe", followed by the list of options and parameters.

Running Wget

Once the utility is installed, it can be launched either from the command line or accessed within shell scripts.

Typical launch:

  • wget https://site.zone/directory/file.zip

Immediately after pressing “enter”, the utility will start downloading the file to the user's home directory (or to another directory according to environment settings).

In the console, wget displays the current speed and overall download progress.

You can change the filename during download:

  • wget -O new-name.zip https://site.zone/directory/source-file.zip

If you need to call up help for the set of options, type:

  • wget -h

Setting Up Wget to Work Through Proxy

Setting Up Wget to Work Through Proxy

The simplest way to specify a wget proxy is through special options in the command line:

  • If the proxy does not require authentication:

wget -e use_proxy=on -e http_proxy=proxy.address.or.IP.address:port https://target.site/directory/file.zip

  • If authentication with a username and password is required:

wget -e use_proxy=on -e http_proxy=132.217.171.127:1234 --proxy-user=USERNAME --proxy-password=PASSWORD https://target.site/directory/file.zip

In some cases, instead of the option "use_proxy=on", the combination "use_proxy=yes" may be used.

If it is inconvenient for you to specify options in the console every time, you can add the proxy wget at the configuration file level. This can be done either in the general configuration directory (/etc/wgetrc) or in the local user config (~.wgetrc, if there is no such file, it can be created manually). Just replace the options with the following (if the user config is created from scratch, just add the options to an empty file):

use_proxy=on

http_proxy=155.217.170.121:12345

https_proxy=155.217.170.121:12345

Naturally, instead of 155.217.170.121:12345, you should specify the actual IP address and port number.

If authentication with a username and password is required, you can use the following construction:

use_proxy = on

http_proxy = http://USERNAME:PASSWORD@155.217.170.121:12345

Now you can run wget without additional keywords; the utility will keep working through proxy.

Rotating Proxy for Wget

Wget does not have built-in tools for proxy rotation. Therefore, if you want to run each new wget with proxy, you need to write a bash script or use the "-e" option.

Example:

wget -e use_proxy=on -e http_proxy=104.254.41.36:1234 --proxy-user=USERNAME-one --proxy-password=PASSWORD-one https://site-one.zone/directory/file-one.zip

wget -e use_proxy=on -e http_proxy=26.104.52.225:2234 --proxy-user=USERNAME-two --proxy-password=PASSWORD-two https://site-two.zone/directory/file-two.zip

wget -e use_proxy=on -e http_proxy=70.174.89.3:44444 --proxy-user=USERNAME-three --proxy-password=PASSWORD-three https://site-three.zone/directory/file-three.zip

And here's how a bash script variant of forced proxy rotation might look, randomly selected from a list stored in the file proxies.txt (let's assume there are 10 lines):

for i in {1..10}

do

proxy=$(shuf -n 1 proxies.txt)

wget -e use_proxy=on -e http_proxy=$proxy --proxy-user=USERNAME --proxy-password=PASSWORD https://target-site.zone/subdirectory/some-file

done

If you're not familiar with scripting, there's another elegant solution – using backconnect proxies. Let’s take Froxy proxy as an example:

  1. A port is configured in the personal account (location and conditions for rotating outgoing IP addresses are defined with each new request, for example);
  2. Proxy port data is copied (this will be a regular proxy for wget).
  3. The requests are then executed similar to using regular individual proxies (wget -e use_proxy=on -e http_proxy=255.89.155.178:1234 --proxy-user=USERNAME --proxy-password=PASSWORD https://target.site/directory/file.zip).
  4. IP address rotation is carried out on the proxy provider's side. The input port remains the same (there is no need to add or update anything in wget).

cURL vs Wget

cURL vs Wget

Both cURL and wget are open-source utilities used for downloading files and other content via HTTP and FTP protocols. They both handle HTTP POST and GET requests, cookies, can work with secure versions of websites (HTTPS) and can be incorporated into bash scripts.

However, they also have distinctions.

Let's start with cURL.

  • This is not just a utility but also a software library that can be used at the code level;
  • Unlike wget, cURL supports a vast number of additional protocols (here is a detailed comparison table).
  • cURL can work through SOCKS proxies (wget supports HTTP only);
  • It offers more capabilities for site authentication and SSL connection support;
  • In addition to POST and GET, it also supports some other methods (e.g., PUT).

On the other hand, wget also has something to offer:

  • Recursive downloading of directory contents is possible;
  • Creating copies of websites is available;
  • Interrupted downloads can be resumed (no need to re-download large files);
  • It has a smaller set of options, making wget easier to manage and configure.

Take your time to find out how to integrate cURL with proxies.

Conclusion and Recommendations

wget with proxy

Wget is a simple yet powerful utility for downloading files and HTML pages. It can be adapted for parsing tasks and can be accessed in the console or through bash scripts. Its downsides include the inability to use it as a library and the lack of built-in proxy rotation.

You can find quality residential and mobile proxies with automatic rotation in our service. Froxy offers over 8 million IP addresses, a convenient interface and targeting up to the city level (with solid coverage in all countries worldwide). Price depends on traffic only. There's a special trial package available for testing the utility features.

Get notified on new Froxy features and updates

Be the first to know about new Froxy features to stay up-to-date with the digital marketplace and receive news about new Froxy features.

Related articles

Solving the Facebook Error Session Expired with Proxies

Proxies

Solving the Facebook Error Session Expired with Proxies

This article explains the causes of the common Facebook error 'Session Expired', the role of proxies in resolving it, and provides instructions on...

Team Froxy 25 Jan 2024 4 min read
Scrape Like a Pro: Best practices for web scraping without getting blocked

Web Scraping

Scrape Like a Pro: Best practices for web scraping without getting blocked

Web scraping is a useful technique for collecting data from websites, but it can easily get you blocked if not done properly. This guide covers best...

Team Froxy 14 Dec 2023 7 min read