How to use GNU Stream Editor (sed)

Friday, 4 February 2022

How to use GNU Stream Editor (sed)

sed is a Unix tool, a GNU stream editor for filtering and transforming text.

From its manual:

Sed is a stream editor. A stream editor is used to perform basic text

transformations on an input stream (a file or input from a pipeline).

While in some ways similar to an editor which permits scripted edits

(such as ed), sed works by making only one pass over the input(s), and

is consequently more efficient. But it is sed's ability to filter text

in a pipeline which particularly distinguishes it from other types of

editors.

uses regular expressions
used for other text manipulation operations like text substitution, insert, delete, search
alternative tools: Perl, AWK
it reads text line by line from a file or input stream into an internal buffer known as the pattern space (each line of input is copied into a pattern space). It then uses one or multiple operations which have been described by a sed script to the pattern space.
sed script can be either described on the command line or read through an isolated file

Syntax:

sed <option> <script> <input_file>

Some options:

-f script-file, --file=script-file = add the contents of script-file to the commands to be executed

-i, --in-place = edit files in place

-n, --quiet, --silent = suppress automatic printing of pattern space

From sed manual:

Addresses

Sed commands can be given with no addresses, in which case the command

will be executed for all input lines; with one address, in which case

the command will only be executed for input lines which match that ad‐

dress; or with two addresses, in which case the command will be exe‐

cuted for all input lines which match the inclusive range of lines

starting from the first address and continuing to the second address.

Three things to note about address ranges: the syntax is addr1,addr2

(i.e., the addresses are separated by a comma); the line which addr1

matched will always be accepted, even if addr2 selects an earlier line;

and if addr2 is a regexp, it will not be tested against the line that

addr1 matched.

After the address (or address-range), and before the command, a ! may

be inserted, which specifies that the command shall only be executed if

the address (or address-range) does not match.

Commands which accept address ranges:

p = Print the current pattern space
s/regexp/replacement/ = Substitute the regex match(es) with replacement. Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.

From man sed:

[2addr]s/regular expression/replacement/flags

Substitute the replacement string for the first instance of the regular expression in the pattern space. Any character other than backslash or

newline can be used instead of a slash to delimit the RE and the replacement. Within the RE and the replacement, the RE delimiter itself can

be used as a literal character if it is preceded by a backslash.

An ampersand (“&”) appearing in the replacement is replaced by the string matching the RE. The special meaning of “&” in this context can be

suppressed by preceding it by a backslash. The string “\#”, where “#” is a digit, is replaced by the text matched by the corresponding

backreference expression (see re_format(7)).

A line can be split by substituting a newline character into it. To specify a newline character in the replacement string, precede it with a

backslash.

The value of flags in the substitute function is zero or more of the following:

N Make the substitution only for the N'th occurrence of the regular expression in the pattern space.

g Make the substitution for all non-overlapping matches of the regular expression, not just the first one.

p Write the pattern space to standard output if a replacement was made. If the replacement string is identical to that which it

replaces, it is still considered to have been a replacement.

w file Append the pattern space to file if a replacement was made. If the replacement string is identical to that which it replaces, it

is still considered to have been a replacement.

i or I Match the regular expression in a case-insensitive way.

How to remove hex characters from the beginning of the file?

Example: The following command removes a BOM Unicode character (xEFBBBF) from the beginning of the file. Removal is done in-place:

$ sed -i '1s/^\xef\xbb\xbf//' commands.sql

1 - execute command only on the first line, other lines are unaffected

s/ - execute substitute command

^ - the beginning of the line (only match at the start of the line)

\xEF\xBB\xBF - bytes to be removed - UTF-8 BOM (escaped hex string)

// - replace with empty string

If we wanted to keep the original file intact and create a new file, with all the changes:

$ sed '1s/^\xef\xbb\xbf//' < commands.sql > new_commands.sql

How to print only X-th line of some command's output?

$ <command> | sed -n 'Lp'

Example: We want to print only 2nd line from the output of this command:

$ gpg --with-fingerprint --show-keys /usr/share/keyrings/oracle-virtualbox-2016.gpg

pub rsa4096 2016-04-22 [SC]

B9F8 D658 297A F3EF C18D 5CDF A2F6 83C5 2980 AECF

uid Oracle Corporation (VirtualBox archive signing key) <info@virtualbox.org>

sub rsa4096 2016-04-22 [E]

We first want to suppress printing entire pattern space so we use -n. We then want to use address (which is basically line number) to specify which line we want to print (with p option):

$ gpg --with-fingerprint --show-keys /usr/share/keyrings/oracle-virtualbox-2016.gpg | sed -n '2p'

B9F8 D658 297A F3EF C18D 5CDF A2F6 83C5 2980 AECF

How to remove leading space characters from some command's output?

$ <command> | sed 's/^ *//g'

Example:

$ gpg --with-fingerprint --show-keys /usr/share/keyrings/oracle-virtualbox-2016.gpg | sed 's/^ *//g'

pub rsa4096 2016-04-22 [SC]

B9F8 D658 297A F3EF C18D 5CDF A2F6 83C5 2980 AECF

uid Oracle Corporation (VirtualBox archive signing key) <info@virtualbox.org>

sub rsa4096 2016-04-22 [E]

How to combine multiple expressions?

We can combine multiple expressions by separating them with semicolon. E.g. let's say we want to remove leading space characters and display only 2nd line from some command's output:

$ gpg --with-fingerprint --show-keys /usr/share/keyrings/oracle-virtualbox-2016.gpg | sed -n '2 s/^ *//g; 2p'

B9F8 D658 297A F3EF C18D 5CDF A2F6 83C5 2980 AECF

Note that we used an address (line number 2) for both commands (s = substitution and p = print) as both of them accept address.