Download an entire website with wget

I often find some web directories full of shit loads of music, tv series, movies, pdf, pictures and porn. With IDM I can download all the files which are on the same page with a single click (right click>Download all links with IDM).

But in case of a huge site, with multiple pages and folders, it becomes clumsy and downloading all those things manually with IDM is just not enough.

Even sometimes I did it with endless passion and enthusiasm, but another problem arises when browsing the top level folders of a site.

For example, lets say, http://vazor.com/drop/bb contains some episodes of Breaking Bad. By clicking up (../) , i can go to the parent folder http://vazor.com/drop and see many of the other folders, not as an html webpage.

But if i click up (../) again, it takes me to the http://vazor.com and that is displayed as a webpage, not as a directory. But there might be other directories under vazor.com folder, (for example, http://vazor.com/music or http://vazor.com/movies containing some other interesting stuff).

So, if i want to to know what are the other top level directories under http://vazor.com, there is no way to find that out. At least, I didnt know of any.

Many years ago, i asked one of my teachers, how to do it. He said, there is obviously a way, but he won’t tell me because it opens the possibility to violate the copyright issues. 😀

Anyway, in Linux, there is a tool called “wget” with which you can download literally whatever you want, even a whole website!!! Thanks to this article and this article in stackoverflow.

So, in summary what i wanted to do is, to download everything from a website. To accomplish that, what i can do is to generate a command with wget to:

download it for me,
don’t download the parent folders,
specify how many levels deep i want to download,
specify what are the type of files i want to download, or what type of files i don’t want,
specify the file size
provide conditions to download based on file type, file size, file name, location, domain etc,
download in a single folder or download exactly the way it was on the server,
and finally make the downloaded websites browsable by converting the links.

I don’t remember all the syntax and commands, but you can always look it up here and here.

There is also another similar thing called curl, i don’t know much about it, but you can read about it here and here.

To wrap it up, what i actually did is, type this command on terminal.

wget -r -P /home/duck/Downloads -A mp4 --no-parent http://vazor.com/drop/bb/

which gave me the directory structure of the site including the files I wanted. So, i should have used a --no-parent command with that.

wget -r -P /home/duck/Downloads -A mp4 --no-parent http://vazor.com/drop/bb/

syntax:

wget -r -P /save/location -A jpeg,jpg,bmp,gif,png --no-parent http://www.domain.com

More information on wget:

-r enables recursive retrieval. See Recursive Download for more information.
-P sets the directory prefix where all files and directories are saved to.
-A sets a whitelist for retrieving only certain file types. Strings and patterns are accepted, and both can be used in a comma separated list (as seen above). See Types of Files for more information.

By the way, if you really want to download an entire website, including everything, here is the wget command syntax for you.

wget --mirror -p --convert-links -P ./home/Downloads http://domain.com

Linked pages in this article:

GNU Wget (gldevelops.wordpress.com)
wget vs curl: How to Download Files Using wget and curl (eyonggu.wordpress.com)
Run wget and other commands in shell script (stackoverflow.com)
WGET FTP Website sucker (fuckingie.wordpress.com)

2 responses to “Download an entire website with wget”

Pingback: Link: Power of Linux wget Command to Downloand Files from Internet » TechNotes·
danitechbox 16/02/2016 at 3:41 am · · Reply →

Thanks very much for sharing. But i don’t have linux.

Quak Quaks of the Ugly Duckling

How i learned to stop worrying and love the blog

Download an entire website with wget

To wrap it up, what i actually did is, type this command on terminal.

By the way, if you really want to download an entire website, including everything, here is the wget command syntax for you.

2 responses to “Download an entire website with wget”

Leave a comment Cancel reply

To wrap it up, what i actually did is, type this command on terminal.

By the way, if you really want to download an entire website, including everything, here is the wget command syntax for you.

Related articles

Share

Related

2 responses to “Download an entire website with wget”

Leave a comment Cancel reply