SED eng

From EIK wiki

SED - sed is a stream editor. The major use of stream editor is to perform basic text transformations on an input stream (a file or input from a pipeline). While in some ways similar to an editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient. But it is sed’s ability to filter text in a pipeline which particularly distinguishes it from other types of editors[1]. it basically means that all the editing are made by calling the command and sed will execute the directions automatically. sed is a very powerful and fast way to transform text.

Introduction

sed stream editor is a text editor that performs editing operations on information coming from standard input or a file. Sed reads and edits text line-by-line and in a non-interactive way.[2]

Main Operation and usage

sed is a line-oriented text processing utility: it reads text, line by line, from an input stream or file, into an internal buffer called the pattern space. Each line read starts a cycle. To the pattern space, sed applies one or more operations which have been specified via a sed script. sed implements a programming language with about 25 commands that specify the operations on the text. For each input line, after running the script sed ordinarily outputs the pattern space (the line as modified by the script) and begins the cycle again with the next line. Other end-of-script behaviors are available through sed options and script commands, e.g. d to delete the pattern space, q to quit, N to add the next line to the pattern space immediately, and so on. Thus a sed script corresponds to the body of a loop that iterates through the lines of a stream, where the loop itself and the loop variable (the current line number) are implicit and maintained by sed.

The sed script can either be specified on the command line (-e option) or read from a separate file (-f option). Commands in the sed script may take an optional address, in terms of line numbers or regular expressions. The address determines when the command is run. For example, 2d would only run the d (delete) command on the second input line (printing all lines but the second), while /^ /d would delete all lines beginning with a space.[3]

In general, sed operates on a stream of text that it reads from either standard input or from a file.which means that all the editing are made by calling the command and sed will execute the directions automatically. Have it in mind that sed outputs everything to standard out by default. which means that, unless redirected, sed print its output to the screen instead of saving it in a file.

RUNNING SED

Sed can be run and invoked as follow

   sed SCRIPT INPUTFILE.

In order to replace all occurrences of ‘hello’ to ‘world’ in the file input.txt:

   sed 's/hello/world/' input.txt > output.txt.

If an INPUTFILE is not specified, or if INPUTFILE is -, sed filters the contents of the standard input. The following commands are thesame:

   sed 's/hello/world/' input.txt > output.txt
   sed 's/hello/world/' < input.txt > output.txt
   cat input.txt | sed 's/hello/world/' - > output.txt.

sed writes output to standard output. Use -i to edit files in-place instead of printing to standard output. See also the W and s///w commands for writing output to other files. The following command modifies file.txt and does not produce any output:

   sed -i 's/hello/world' file.txt

By default sed prints all processed input (except input that has been modified/deleted by commands such as d). Use -n to suppress output, and the p command to print specific lines. The following command prints only line 50 of the input file

   sed -n '50p' file.txt

sed treats multiple input files as one long stream. The following example prints the first line of the first file (one.txt) and the last line of the last file (three.txt). Use -s to reverse this behavior.

   sed -n  '1p ; $p' one.txt two.txt three.txt

Without -e or -f options, sed uses the first non-option parameter as the script, and the following non-option parameters as input files. If -e or -f options are used to specify a script, all non-option parameters are taken as input files. Options -e and -f can be combined, and can appear multiple times (in which case the final effective script will be concatenation of all the individual scripts).

The following examples are equivalent:

   sed 's/hello/world/' input.txt > output.txt
  
   sed -e 's/hello/world/' input.txt > output.txt
   sed --expression='s/hello/world/' input.txt > output.txt
  
   echo 's/hello/world/' > myscript.sed
   sed -f myscript.sed input.txt > output.txt
   sed --file=myscript.sed input.txt > output.txt


USE AND EDITTING OF FILES

sed enable you to work on a file that you've already created and also sed print its output to the screen instead of saving it in a file. it give you a flexible way to edit a file to the standard you want it to be.

Taking an instance, let's copy some files into our home directory to practice some editing.

   cd
   cp /usr/share/common-licenses/BSD .
   cp /usr/share/common-licenses/GPL-3 .

Now, Let's use sed to view the contents of the BSD license file we copied.

   sed  BSD
   

The output of the file is:

   Copyright (c) The Regents of the University of California.
   All rights reserved.
   
   Redistribution and use in source and binary forms, with or without
   modification, are permitted provided that the following conditions
   are met:
   1. Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
   2. Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.

Demonstrate how sed can use standard input by piping the output of the "cat" command into sed to produce the same result.

   cat BSD | sed 


   Copyright (c) The Regents of the University of California.
   All rights reserved.
   
   Redistribution and use in source and binary forms, with or without
   modification, are permitted provided that the following conditions
   are met:
   1. Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
   2. Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.
   . . .
   . . .

As shown below, files can be worked on or streams of text (as is produced when piping output with the "|" character)

Printing Lines

from the previous example, input passed into sed without any operations and the results was prited directly to standard output. when using sed to print, the "print" command, is specified by using the "p" character within single quotes.

   sed 'p' filename

With this print command, sed will print each line twice because it automatically prints each line, note that we've told it to print explicitly with the "p" command. sed operates line by line. It accepts a line, operates on it, and outputs the resulting text before repeating the process on the next line.

The printing results can be clean by passing the "-n" option to sed, which suppresses the automatic printing and output the actual command that its given to print:

   sed -n 'p' filename

This command will print out all the lines in the file as it was without duplicating or printing out a specific line. the example shown below can not be considered to be 'Editing', it has to at least print a specific line or make changes to the file before it can be seen as being edited.

Address specification

Sed can print just the first line by using this command:

   sed -n '1p' filename

placing the number "1" before the print command"p", it enable sed to know the line number to operate on. more than one line can easily be printed by indicating the number of lines to be printed in the command example

   sed -n '1,6p' filename

in this example, sed will print from line 1 through to line 6, the command gives sed an address range and will only execute the commands that follow on those line. this can be done in another way by by giving the first address and then using an offset to tell sed how many additional lines to travel:

   sed -n '1,+5p' filename

This will basically result in the same output, sed will start at line 1 and then operate on the next 5 lines as well.

To print every other line, specify the interval after the "~" character. The following line will print every other line starting with line 1:

   sed -n '1~3p' filename

Deleting Text

A text can easily be deleted by changing the previous text printing "p" command to "d" command.

The "-n" command will not not needed because with the delete command, sed will print everything that is not deleted, which will what's going on.

Using the last command from the previous section to make it delete every other line starting with the first.

   sed '1~3d' filename

It is important to note that the source file is will not be affected. It is still intact. The edits are output to the screen.

To save the edits, redirect standard output to a file like so:

   sed '1~2d' filename > anotherfile.txt

To create a backup file prior to editing, add the backup extension directly after the "-i" option:

   sed -i.bak '1~2d' anotherfile.txt

This will create a backup file with the ".bak" extension, and then edit the regular file in-place.

Substituting Text

one of the main use of sed is substituting of text. Sed has the ability to search for text patterns using regular expressions, and then replace the found text.

the following syntax can change one word to another word

   's/old_word/new_word/'

The "s" is the substitute command. The three slashes (/) are used to separate the different text fields. other characters can be use to delimit the fields if it would be more helpful. For instance, when trying to change a website name, using another delimiter would be helpful since URLs contain slashes. use echo to pipe in an example:

   echo "http://www.example.com/index.html" | sed 's_com/index_org/home_'
   http://www.example.org/home.html

Do not forget the final delimiter, or sed will complain.

   echo "http://www.example.com/index.html" | sed 's_com/index_org/home'
   sed: -e expression #1, char 22: unterminated `s' command

Let's create a file to practice substitutions on:

   echo "i want to be good
   it actually takes nothing to be good actually
   good music help me think positively
   that could result to my good deed
   and motivate me to stand by you
   that is absolutely life..." > funny.txt

substitute the expression "actually" with"really".

   sed 's/actully/really/' funny.txt

Output

   i want to be good
   it really take nothing to be good actually
   good music help me think positively
   that could result to my good deed
   and motivate me to stand by you
   that is absolutely life..." > funny.txt

First note that, patterns is replaced , not words. The first "actually" on the second line changed to "really" but the second actually did not.

this is because the "s" command operates on the first match in a line and then moves to the next line.

To make sed replace every instance of "on" instead of just the first on each line, an optional flag can be pass to the substitute command.

"g" flag will affect the substitute command by placing it after the substitution set.

   sed 's/actually/really/g' funny.txt
   i want to be good
   it really take nothing to be good really
   good music help me think positively
   that could result to my good deed
   and motivate me to stand by you
   that is absolutely life..." > funny.txt

To change the second instance of "actually" that sed finds on each line, then use the number "2" instead of the "g".

   sed 's/actually/really/2' funny.txt
   i want to be good
   it actually take nothing to be good really
   good music help me think positively
   that could result to my good deed
   and motivate me to stand by you
   that is absolutely life..." > funny.txt

To see only the lines that were substituted, use the "-n" option again to suppress automatic printing. then pass the "p" flag to the substitute command to print lines where substitution took place.

   sed -n 's/actually/really/2p' funny.text
   it actually take nothing to be good really

For the search process to ignore case, pass it the "i" flag.

   sed 's/MUSIC/life/i' funny.txt
   i want to be good
   it actually take nothing to be good really
   good life help me think positively
   that could result to my good deed
   and motivate me to stand by you
   that is absolutely life..." > funny.txt

Matching Text

To find more complex patterns with regular expressions, there are number of different methods of referencing that matched pattern in the replacement text. For instance,to match the from the beginning of the line to "to" we can use the expression:

   sed 's/^.*to/REPLACED/' funny.txt
   REPLACED be good
   it actually take nothing to be good really
   good life help me think positively
   that could result to my good deed
   and motivate me to stand by you
   that is absolutely life..." > funny.txt

You can see that the wildcard expression matches from the beginning of the line to the last instance of "to"

Since you don't know the exact phrase that will match in the search string, you can use the "&" character to represent the matched text in the replacement string.

This example shows how to put parentheses around the matched text:

   sed 's/^.*to/(&)/' annoying.txt
   (i want to) be good
   it actually take nothing to be good really
   good life help me think positively
   that could result to my good deed
   and motivate me to stand by you
   that is absolutely life..." > funny.txt

Command-Line Options

The full format for invoking sed is:

   sed OPTIONS... [SCRIPT] [INPUTFILE...]

i.e use sed before '--' sed may be invoked with the following command-line options:

   --version

Print out the version of sed that is being run and a copyright notice, then exit.

   --help

Print a usage message briefly summarizing these command-line options and the bug-reporting address, then exit.

   -n
   --quiet
   --silent

By default, sed prints out the pattern space at the end of each cycle through the script (see How sed works). These options disable this automatic printing, and sed only produces output when explicitly told to via the p command.

   -e script
   --expression=script

Add the commands in script to the set of commands to be run while processing the input.

   -f script-file
   --file=script-file

Add the commands contained in the file script-file to the set of commands to be run while processing the input.

   -i[SUFFIX]
   --in-place[=SUFFIX]

This option specifies that files are to be edited in-place. GNU sed does this by creating a temporary file and sending output to this file rather than to the standard output.1.

This option implies -s.

When the end of the file is reached, the temporary file is renamed to the output file’s original name. The extension, if supplied, is used to modify the name of the old file before renaming the temporary file, thereby making a backup copy2).

This rule is followed: if the extension doesn’t contain a *, then it is appended to the end of the current filename as a suffix; if the extension does contain one or more * characters, then each asterisk is replaced with the current filename. This allows you to add a prefix to the backup file, instead of (or in addition to) a suffix, or even to place backup copies of the original files into another directory (provided the directory already exists).

If no extension is supplied, the original file is overwritten without making a backup.

   -l N
   --line-length=N

Specify the default line-wrap length for the l command. A length of 0 (zero) means to never wrap long lines. If not specified, it is taken to be 70.

   --posix

GNU sed includes several extensions to POSIX sed. In order to simplify writing portable scripts, this option disables all the extensions that this manual documents, including additional commands. Most of the extensions accept sed programs that are outside the syntax mandated by POSIX, but some of them (such as the behavior of the N command described in Reporting Bugs) actually violate the standard. If you want to disable only the latter kind of extension, you can set the POSIXLY_CORRECT variable to a non-empty value.

   -b
   --binary

This option is available on every platform, but is only effective where the operating system makes a distinction between text files and binary files. When such a distinction is made—as is the case for MS-DOS, Windows, Cygwin—text files are composed of lines separated by a carriage return and a line feed character, and sed does not see the ending CR. When this option is specified, sed will open input files in binary mode, thus not requesting this special processing and considering lines to end at a line feed.

   --follow-symlinks

This option is available only on platforms that support symbolic links and has an effect only if option -i is specified. In this case, if the file that is specified on the command line is a symbolic link, sed will follow the link and edit the ultimate destination of the link. The default behavior is to break the symbolic link, so that the link destination will not be modified.

   -E
   -r
   --regexp-extended

Use extended regular expressions rather than basic regular expressions. Extended regexps are those that egrep accepts; they can be clearer because they usually have fewer backslashes. Historically this was a GNU extension, but the -E extension has since been added to the POSIX standard, so use -E for portability. GNU sed has accepted -E as an undocumented option for years, and *BSD seds have accepted -E for years as well, but scripts that use -E might not port to other older systems. See Extended regular expressions.

   -s
   --separate

By default, sed will consider the files specified on the command line as a single continuous long stream. This GNU sed extension allows the user to consider them as separate files: range addresses (such as ‘/abc/,/def/’) are not allowed to span several files, line numbers are relative to the start of each file, $ refers to the last line of each file, and files invoked from the R commands are rewound at the start of each file.

   --sandbox

In sandbox mode, e/w/r commands are rejected - programs containing them will be aborted without being run. Sandbox mode ensures sed operates only on the input files designated on the command line, and cannot run external programs.

   -u
   --unbuffered

Buffer both input and output as minimally as practical. (This is particularly useful if the input is coming from the likes of ‘tail -f’, and you wish to see the transformed output as soon as possible.)

   -z
   --null-data
   --zero-terminated

Treat the input as a set of lines, each terminated by a zero byte (the ASCII ‘NUL’ character) instead of a newline. This option can be used with commands like ‘sort -z’ and ‘find -print0’ to process arbitrary file names.

If no -e, -f, --expression, or --file options are given on the command-line, then the first non-option argument on the command line is taken to be the script to be executed.

If any command-line parameters remain after processing the above, these parameters are interpreted as the names of input files to be processed. A file name of ‘-’ refers to the standard input stream. The standard input will be processed if no file names are specified.

see also

Linux sed commands

An Introduction and Tutorial by Bruce Barnett

Linux man page

Reference