Sunday 19 May 2024

Introduction to GNU Wget tool

 


Wget is a tool that retrieves content from web servers.
Name comes from "www get".


To download a file from url [see Download Options (GNU Wget 1.24.5 Manual)]:

$ wget www.example.com/file.txt

If website has index file e.g. index.html, the following command will download it:

$ wget www.google.com
--2024-05-19 00:00:04--  http://www.google.com/
Resolving www.google.com (www.google.com)... 172.217.169.4, 2a00:1450:4009:817::2004
Connecting to www.google.com (www.google.com)|172.217.169.4|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’

index.html              [ <=>                ]  20.32K  --.-KB/s    in 0.01s   

2024-05-19 00:00:04 (1.89 MB/s) - ‘index.html’ saved [20807]


To set the custom name of the file into which a downloaded content will be saved into:

$ wget -O downloaded_file.txt www.example.com/file.txt
$ wget -O google_index.html www.google.com 

or

$ wget --output-document=downloaded_file.txt www.example.com/file.txt
$ wget -output-document=google_index.html www.google.com


If we want downloaded content only to be displayed in STDOUT (terminal), we can use "-" which is a pipe redirection to STDOUT:

$ wget -O - www.google.com
--2024-05-19 00:00:49--  http://www.google.com/
Resolving www.google.com (www.google.com)... 172.217.169.4, 2a00:1450:4009:817::2004
Connecting to www.google.com (www.google.com)|172.217.169.4|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘STDOUT’
-                                                   [<=>                                                                                                     ]       0  --.-KB/s               <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" 
...
</script></-                                                   [ <=>                                                                                                    ]  19.93K  --.-KB/s    in 0.02s   
2024-05-19 00:00:50 (818 KB/s) - written to stdout [20411]

Dash (-) is often put straight after -O so we have -O-.

Mind the capitalisation:

-O file = outputs the downloaded content into the file or device (e.g. /dev/null)
-o file = outputs log information to file

If we don't need the command status information we can use this option:

-q = quiet; don't output status information

The following command will not print any status information and it will not save the downloaded content into the file:

wget -q -O /dev/null www.google.com


To show response headers, use -S:

$ wget -S "https://code.visualstudio.com/sha/download?build=stable&os=linux-deb-x64" -O /dev/null
--2024-06-18 10:43:12--  https://code.visualstudio.com/sha/download?build=stable&os=linux-deb-x64
Resolving code.visualstudio.com (code.visualstudio.com)... 13.107.253.64, 2620:1ec:29:1::64
Connecting to code.visualstudio.com (code.visualstudio.com)|13.107.253.64|:443... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 302 Found
  Date: Tue, 18 Jun 2024 09:43:13 GMT
  Content-Type: text/plain; charset=utf-8
  Content-Length: 162
  Connection: keep-alive
  Location: https://vscode.download.prss.microsoft.com/dbazure/download/stable/611f9bfce64f25108829dd295f54a6894e87339d/code_1.90.1-1718141439_amd64.deb
  Set-Cookie: MSFPC=GUID%3Deecbc0d63e8183c1315318e8090e8446%26HASH%3Deecb%26LV%3D202406%26V%3D4%26LU%3D1718703792920; Max-Age=31536000; Path=/; Expires=Wed, 18 Jun 2025 09:43:12 GMT; Secure; SameSite=None
  Vary: Accept
  Strict-Transport-Security: max-age=31536000; includeSubDomains
  request-context: appId=cid-v1:
  Content-Security-Policy: frame-ancestors 'self'
  X-XSS-Protection: 1; mode=block
  X-Content-Type-Options: nosniff
  X-Powered-By: ASP.NET
  x-azure-ref: 20240618T094312Z-17ddf88f4d85c2wj3q7hcbzesg000000072g000000008114
  X-Cache: CONFIG_NOCACHE
  set-cookie: ASLBSA=00030be214d0cddc3ff4cc4fe91410644048dd542080144d9b14485a532831c2633c; Path=/; Secure; HttpOnly;
  set-cookie: ASLBSACORS=00030be214d0cddc3ff4cc4fe91410644048dd542080144d9b14485a532831c2633c; SameSite=none; Path=/; Secure; HttpOnly;
Location: https://vscode.download.prss.microsoft.com/dbazure/download/stable/611f9bfce64f25108829dd295f54a6894e87339d/code_1.90.1-1718141439_amd64.deb [following]
--2024-06-18 10:43:13--  https://vscode.download.prss.microsoft.com/dbazure/download/stable/611f9bfce64f25108829dd295f54a6894e87339d/code_1.90.1-1718141439_amd64.deb
Resolving vscode.download.prss.microsoft.com (vscode.download.prss.microsoft.com)... 152.199.21.175, 2606:2800:233:1cb7:261b:1f9c:2074:3c
Connecting to vscode.download.prss.microsoft.com (vscode.download.prss.microsoft.com)|152.199.21.175|:443... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Accept-Ranges: bytes
  Age: 58539
  Cache-Control: public, max-age=86400
  Content-Disposition: attachment; filename=code_1.90.1-1718141439_amd64.deb; filename*=UTF-8''code_1.90.1-1718141439_amd64.deb
  Content-Type: application/octet-stream
  Date: Tue, 18 Jun 2024 09:43:13 GMT
  Etag: "0x6DB5869CE33DCF08E5A02BCF311A662BEC363FBB7CC24B9D8FB4FE4790A0AFD1"
  Last-Modified: Tue, 11 Jun 2024 21:56:41 GMT
  Server: ECAcc (lhc/788C)
  X-Cache: HIT
  X-Ms-ApiVersion: Distribute 1.2
  X-Ms-Region: prod-neu-z1
  Content-Length: 102264142
Length: 102264142 (98M) [application/octet-stream]
Saving to: ‘/dev/null’

/dev/null                                    23%[===================>                                                                     ]  22.52M  3.31MB/s    eta 23s  



To delete the file after it is downloaded we can use the --delete-after option. 

Resources:

No comments: