One minute
Download An Entire Site With wget
There are many ways to crawl a site, here is one I found easy.
I often need to work with sites offline for whatever reason. Here is how I do it at the command line with Ubuntu and wget
:
$ wget https://www.andrewjstevens.com \
--recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--domains andrewjstevens.com
Options explained:
- recursive: Recursively download the entire site.
- no-clobber: Do not overwrite existing files. Great if you need to interrupt or resume the download.
- page-requisites: Grab all page items including images, CSS, JS, etc.
- html-extension: Save files with a
.html
file extension. - convert-links: Convert links, so the site will work offline.
- domains: Only follow links on the specified domain.
Update July 13, 2023: I am now using httrack instead for this, for example to download a single page:
httrack --ext-depth=1 "url"