Sunday 19 May 2024

Introduction to GNU Wget tool

 


Wget is a tool that retrieves content from web servers.
Name comes from "www get".


To download a file from url [see Download Options (GNU Wget 1.24.5 Manual)]:

$ wget www.example.com/file.txt

If website has index file e.g. index.html, the following command will download it:

$ wget www.google.com
--2024-05-19 00:00:04--  http://www.google.com/
Resolving www.google.com (www.google.com)... 172.217.169.4, 2a00:1450:4009:817::2004
Connecting to www.google.com (www.google.com)|172.217.169.4|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’

index.html              [ <=>                ]  20.32K  --.-KB/s    in 0.01s   

2024-05-19 00:00:04 (1.89 MB/s) - ‘index.html’ saved [20807]


To set the custom name of the file into which a downloaded content will be saved into:

$ wget -O downloaded_file.txt www.example.com/file.txt
$ wget -O google_index.html www.google.com 

or

$ wget --output-document=downloaded_file.txt www.example.com/file.txt
$ wget -output-document=google_index.html www.google.com


If we want downloaded content only to be displayed in STDOUT, we can use "-" which is a pipe redirection to STDOUT:

$ wget -O - www.google.com
--2024-05-19 00:00:49--  http://www.google.com/
Resolving www.google.com (www.google.com)... 172.217.169.4, 2a00:1450:4009:817::2004
Connecting to www.google.com (www.google.com)|172.217.169.4|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘STDOUT’
-                                                   [<=>                                                                                                     ]       0  --.-KB/s               <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" 
...
</script></-                                                   [ <=>                                                                                                    ]  19.93K  --.-KB/s    in 0.02s   
2024-05-19 00:00:50 (818 KB/s) - written to stdout [20411]

Dash (-) is often put straight after -O so we have -O-.

Mind the capitalisation:

-O file = outputs the downloaded content into the file or device (e.g. /dev/null)
-o file = outputs log information to file

If we don't need the command status information we can use this option:

-q = quiet; don't output status information

The following command will not print any status information and it will not save the downloaded content into the file:

wget -q -O /dev/null www.google.com


Resources:

No comments: