Use sed to accelerate daily text processing
21 Oct 2013

It important to keep "Don't repeat yourself" in mind, especially for people like me who are playing with large numbers of input/output files everyday. However, one very monster keeps me repeating myself from time to time. That monster is sed (stream editor). Below is my summary on basic and most frequently used features of the sed editor.

sed is a stream editor. It works with streams of characters in a line-by-line way. It also works in a non-interactive way, which is pretty awkward for people who are used to interactive editors at first glance.

How sed works

It is helpful to read the manual and understand how sed actually works when executed. According to the manual:

sed operates by performing the following cycle on each line of input: first, sed reads one line from the input stream, removes any trailing newline, and places it in the pattern space. Then commands are executed; each command can have an address associated to it: addresses are a kind of condition code, and a command is only executed if the condition is verified before the command is to be executed.

When the end of the script is reached, unless the -n option is in use, the contents of pattern space are printed out to the output stream, adding back the trailing newline if it was removed. Then the next cycle starts for the next input line.

Unless special commands (like D) are used, the pattern space is deleted between two cycles. The hold space, on the other hand, keeps its data between cycles (see commands h, H, x, g, G to move data between both buffers).


Basic Usage

sed [options] commands [file-to-edit]

sed applies commands to the file to be edited. options are used to change sed's default behaviors.

Suppress auto printing

By default sed sends its output to the standard output and keeps the original file unchanged. Use -n option to suppress the behavior.

Edit inplace

By default sed keeps the original file unchanged. Use -i=.extension to edit the file inplace and save a backup file as file-to-edit.extension.

Lines to edit

Use line numbers before each command to limit to which lines shall the command be applied. Common addressing rules are:

  • number, only matches one line.
  • first~step, selects every stepth line starting with the line first, i.e. all lines with line number that satisfies first + n * step, where n is a non-negative integer, are selected.
  • $, selects the last line of the last file of input; will select last lines of each file when the -i or -s options are specified.
  • first, +N, matches first line and the N lines following first.
  • first, ~N, matches first line and the lines following first untill the next line whose input line number is a multiple of N.
  • /regexp/, selects any line which matches the regular expression regexp.

Til next time,
Jianfeng at 22:34