Hello world!
Home
Guides
Videos
Software
Mirroring/Downloading websites with wget
Since you probably don't want to accidently download very large file you can download this patch which allows limiting file size.
Then you can download site recursively like this:
wget -r -np -p -E -k -K --limit-size=10M site_url
Options:
- -r
--recursive
Turn on recursive retrieving. The default maximum depth is 5. To increase maximum depth use -l
- -np --no-parent
Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only
the files below a certain hierarchy will be downloaded.
- -p
--page-requisites
This option causes Wget to download all the files that are necessary to properly display a given HTML page. This includes such
things as inlined images, sounds, and referenced stylesheets.
- -E
--adjust-extension
If a file of type application/xhtml+xml or text/html is downloaded and the URL does not end with the regexp \.[Hh][Tt][Mm][Ll]?,
this option will cause the suffix .html to be appended to the local filename.
- -k
--convert-links
After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only
the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style
sheets, hyperlinks to non-HTML content, etc.
wget also provides options for bypassing download-prevention mechanisms:
wget -r -np -k --random-wait -e robots=off --user-agent "Mozilla/5.0" 'target-url-here'