Download an entire website with wget

I often find some web directories full of shit loads of music, tv series, movies, pdf, pictures and porn. With IDM I can download all the files which are on the same page with a single click (right click>Download all links with IDM).

But in case of a huge site, with multiple pages and folders, it becomes clumsy and downloading all those things manually with IDM is just not enough.

Even sometimes I did it with endless passion and enthusiasm, but another problem arises when browsing the top level folders of a site.

Screenshot from 2013-11-26 02:20:59

For example, lets say, contains some episodes of Breaking Bad. By clicking up (../) , i can go to the parent folder and see many of the other folders, not as an html webpage.

Screenshot from 2013-11-26 02:18:54

But if i click up (../) again, it takes me to the and that is displayed as a webpage, not as a directory. But there might be other directories under folder, (for example, or containing some other interesting stuff).

So, if i want to to know what are the other top level directories under, there is no way to find that out. At least, I didnt know of any.

Many years ago, i asked one of my teachers, how to do it. He said, there is obviously a way, but he won’t tell me because it opens the possibility to violate the copyright issues. 😀

Anyway, in Linux, there is a tool called “wget” with which you can download literally whatever you want, even a whole website!!! Thanks to this article and this article in stackoverflow.

So, in summary what i wanted to do is, to download everything from a website. To accomplish that, what i can do is to generate a command with wget to:

  • download it for me,
  • don’t download the parent folders,
  • specify how many levels deep i want to download,
  • specify what are the type of files i want to download, or what type of files i don’t want,
  • specify the file size
  • provide conditions to download based on file type, file size, file name, location, domain etc,
  • download in a single folder or download exactly the way it was on the server,
  • and finally make the downloaded websites browsable by converting the links.

I don’t remember all the syntax and commands, but you can always look it up here and here.

There is also another similar thing called curl, i don’t know much about it, but you can read about it here and here.

To wrap it up, what i actually did is, type this command on terminal.

wget -r -P /home/duck/Downloads -A mp4 --no-parent

which gave me the directory structure of the site including the files I wanted. So, i should have used a --no-parent command with that.

wget -r -P /home/duck/Downloads -A mp4 --no-parent


wget -r -P /save/location -A jpeg,jpg,bmp,gif,png --no-parent

More information on wget:

  • -r enables recursive retrieval. See Recursive Download for more information.
  • -P sets the directory prefix where all files and directories are saved to.
  • -A sets a whitelist for retrieving only certain file types. Strings and patterns are accepted, and both can be used in a comma separated list (as seen above). See Types of Files for more information.

By the way, if you really want to download an entire website, including everything, here is the wget command syntax for you.

wget --mirror -p --convert-links -P ./home/Downloads

Linked pages in this article:


2 responses to “Download an entire website with wget

  1. Pingback: Link: Power of Linux wget Command to Downloand Files from Internet » TechNotes·

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s