Use sed to accelerate daily text processing
21 Oct 2013
It important to keep "Don't repeat yourself" in mind, especially for people like
me who are playing with large numbers of input/output files everyday. However,
one very monster keeps me repeating myself from time to time. That monster is
sed (stream editor). Below is my summary on
basic and most frequently used features of the sed editor.
sed is a stream editor. It works with streams of characters in a line-by-line
way. It also works in a non-interactive way, which is pretty awkward for people
who are used to interactive editors at first glance.
It is helpful to read the manual and understand how
sed actually works when
executed. According to the
sed operates by performing the following cycle on each line of input: first, sed reads one line from the input stream, removes any trailing newline, and places it in the pattern space. Then commands are executed; each command can have an address associated to it: addresses are a kind of condition code, and a command is only executed if the condition is verified before the command is to be executed.
When the end of the script is reached, unless the -n option is in use, the contents of pattern space are printed out to the output stream, adding back the trailing newline if it was removed. Then the next cycle starts for the next input line.
Unless special commands (like
D) are used, the pattern space is deleted between two cycles. The hold space, on the other hand, keeps its data between cycles (see commands
Gto move data between both buffers).
sed [options] commands [file-to-edit]
commands to the file to be edited.
options are used to change
sed's default behaviors.
Suppress auto printing
sed sends its output to the standard output and keeps the original
file unchanged. Use
-n option to suppress the behavior.
sed keeps the original file unchanged. Use
-i=.extension to edit
the file inplace and save a backup file as
Lines to edit
Use line numbers before each command to limit to which lines shall the command be applied. Common addressing rules are:
number, only matches one line.
first~step, selects every
stepth line starting with the line
first, i.e. all lines with line number that satisfies
first + n * step, where
nis a non-negative integer, are selected.
$, selects the last line of the last file of input; will select last lines of each file when the
-soptions are specified.
first, +N, matches
firstline and the
first, ~N, matches
firstline and the lines following
firstuntill the next line whose input line number is a multiple of
/regexp/, selects any line which matches the regular expression
Til next time,
Jianfeng at 22:34