Guide for downloading all files and folders at a URL using Wget with options to clean up the download location and pathname. A basic Wget rundown post can be found here.
GNU Wget is a popular command-based, open-source software for downloading files and directories with compatibility amongst popular internet protocols.
You can read the Wget docs here for many more options.
For this example assume the URL containing all the files (and folders) we want to download is here:
https://domain.com/uploads/
One file example
The simple command to download just one file is:
wget 'https://domain.com/uploads/file.zip'
This will download file.zip into the current directory as file.zip.
To download the file and save as a certain name:
wget -O 'example_name.zip' 'https://domain.com/uploads/file.zip'
Recursive downloading
Now to download everything, start by adding the recursive and no parent flags:
wget -r -np
The -r
flag means recursive download which will grab and follow the links and directories (default max depth is 5). -np
is “no parent” making only directories below the stated one fetched.
Optionally yet something I find appealing is adding -nH
which will remove the hostname parent directory from the save path when downloading, you will see this below.
Assume we are in the wget_testing directory (wget_testing/
)
wget -r -np -nH https://domain.com/uploads/
This will download all into wget_testing/uploads
Cut directories from save name path
Time to use another flag which is --cut-dirs
which will cut the directory names away starting after the hostname.
wget -r -np -nH --cut-dirs=1 https://domain.com/uploads/
Downloads all straight into wget_testing/ notice how the uploads directory got cut.
Set a folder to download into
If you want to download into a folder use the -P
flag:
wget -r -np -nH --cut-dirs=1 -P afolder https://domain.com/uploads/
Downloads all into wget_testing/afolder/
Don’t download index files
Avoid downloading all of the index.html files with the reject flag -R
wget -r -np -nH --cut-dirs=1 -R "index.html*" https://domain.com/uploads/
Downloads all into wget_testing/ with NO index.html files.
Run Wget in the background
Finally to run a command in the background use nohup before the get command and add & after it:
nohup wget -r -np -nH https://domain.com/uploads/ &