I often find some web directories full of shit loads of music, tv series, movies, pdf, pictures and porn. With IDM I can download all the files which are on the same page with a single click (right click>Download all links with IDM).
But in case of a huge site, with multiple pages and folders, it becomes clumsy and downloading all those things manually with IDM is just not enough.
Even sometimes I did it with endless passion and enthusiasm, but another problem arises when browsing the top level folders of a site.
For example, lets say, http://vazor.com/drop/bb contains some episodes of Breaking Bad. By clicking up (../) , i can go to the parent folder http://vazor.com/drop and see many of the other folders, not as an html webpage.
But if i click up (../) again, it takes me to the http://vazor.com and that is displayed as a webpage, not as a directory. But there might be other directories under vazor.com folder, (for example, http://vazor.com/music or http://vazor.com/movies containing some other interesting stuff).
So, if i want to to know what are the other top level directories under http://vazor.com, there is no way to find that out. At least, I didnt know of any.
Many years ago, i asked one of my teachers, how to do it. He said, there is obviously a way, but he won’t tell me because it opens the possibility to violate the copyright issues. 😀
Anyway, in Linux, there is a tool called “wget” with which you can download literally whatever you want, even a whole website!!! Thanks to this article and this article in stackoverflow.
So, in summary what i wanted to do is, to download everything from a website. To accomplish that, what i can do is to generate a command with wget to:
- download it for me,
- don’t download the parent folders,
- specify how many levels deep i want to download,
- specify what are the type of files i want to download, or what type of files i don’t want,
- specify the file size
- provide conditions to download based on file type, file size, file name, location, domain etc,
- download in a single folder or download exactly the way it was on the server,
- and finally make the downloaded websites browsable by converting the links.
I don’t remember all the syntax and commands, but you can always look it up here and here.
There is also another similar thing called curl, i don’t know much about it, but you can read about it here and here.
To wrap it up, what i actually did is, type this command on terminal.
wget -r -P /home/duck/Downloads -A mp4
--no-parenthttp://vazor.com/drop/bb/
which gave me the directory structure of the site including the files I wanted. So, i should have used a --no-parent command with that.
wget -r -P /home/duck/Downloads -A mp4
--no-parenthttp://vazor.com/drop/bb/
syntax:
wget -r -P /save/location -A jpeg,jpg,bmp,gif,png --no-parent http://www.domain.com
-r
enables recursive retrieval. See Recursive Download for more information.-P
sets the directory prefix where all files and directories are saved to.-A
sets a whitelist for retrieving only certain file types. Strings and patterns are accepted, and both can be used in a comma separated list (as seen above). See Types of Files for more information.
By the way, if you really want to download an entire website, including everything, here is the wget command syntax for you.
wget --mirror -p --convert-links -P ./home/Downloads http://domain.com
Linked pages in this article:
- wget manual
- The Ultimate Wget Download Guide With 15 Awesome Examples
- wget vs curl: How to Download Files Using wget and curl
- 15 Practical Linux cURL Command Examples (cURL Download Examples)
- http://www.editcorp.com/Personal/Lars_Appel/wget/v1/wget_7.html
- How do I use Wget to download all Images into a single Folder
- Download a working local copy of a webpage
Related articles
- GNU Wget (gldevelops.wordpress.com)
- wget vs curl: How to Download Files Using wget and curl (eyonggu.wordpress.com)
- Run wget and other commands in shell script (stackoverflow.com)
- WGET FTP Website sucker (fuckingie.wordpress.com)
Pingback: Link: Power of Linux wget Command to Downloand Files from Internet » TechNotes·
Thanks very much for sharing. But i don’t have linux.