Annotation of src/usr.bin/sed/POSIX, Revision 1.1
1.1 ! deraadt 1: # from: @(#)POSIX 8.1 (Berkeley) 6/6/93
! 2: # $Id: POSIX,v 1.3 1994/02/03 23:44:42 cgd Exp $
! 3:
! 4: Comments on the IEEE P1003.2 Draft 12
! 5: Part 2: Shell and Utilities
! 6: Section 4.55: sed - Stream editor
! 7:
! 8: Diomidis Spinellis <dds@doc.ic.ac.uk>
! 9: Keith Bostic <bostic@cs.berkeley.edu>
! 10:
! 11: In the following paragraphs, "wrong" usually means "inconsistent with
! 12: historic practice", as most of the following comments refer to
! 13: undocumented inconsistencies between the historical versions of sed and
! 14: the POSIX 1003.2 standard. All the comments are notes taken while
! 15: implementing a POSIX-compatible version of sed, and should not be
! 16: interpreted as official opinions or criticism towards the POSIX committee.
! 17: All uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2.
! 18:
! 19: 1. 32V and BSD derived implementations of sed strip the text
! 20: arguments of the a, c and i commands of their initial blanks,
! 21: i.e.
! 22:
! 23: #!/bin/sed -f
! 24: a\
! 25: foo\
! 26: \ indent\
! 27: bar
! 28:
! 29: produces:
! 30:
! 31: foo
! 32: indent
! 33: bar
! 34:
! 35: POSIX does not specify this behavior as the System V versions of
! 36: sed do not do this stripping. The argument against stripping is
! 37: that it is difficult to write sed scripts that have leading blanks
! 38: if they are stripped. The argument for stripping is that it is
! 39: difficult to write readable sed scripts unless indentation is allowed
! 40: and ignored, and leading whitespace is obtainable by entering a
! 41: backslash in front of it. This implementation follows the BSD
! 42: historic practice.
! 43:
! 44: 2. Historical versions of sed required that the w flag be the last
! 45: flag to an s command as it takes an additional argument. This
! 46: is obvious, but not specified in POSIX.
! 47:
! 48: 3. Historical versions of sed required that whitespace follow a w
! 49: flag to an s command. This is not specified in POSIX. This
! 50: implementation permits whitespace but does not require it.
! 51:
! 52: 4. Historical versions of sed permitted any number of whitespace
! 53: characters to follow the w command. This is not specified in
! 54: POSIX. This implementation permits whitespace but does not
! 55: require it.
! 56:
! 57: 5. The rule for the l command differs from historic practice. Table
! 58: 2-15 includes the various ANSI C escape sequences, including \\
! 59: for backslash. Some historical versions of sed displayed two
! 60: digit octal numbers, too, not three as specified by POSIX. POSIX
! 61: is a cleanup, and is followed by this implementation.
! 62:
! 63: 6. The POSIX specification for ! does not specify that for a single
! 64: command the command must not contain an address specification
! 65: whereas the command list can contain address specifications. The
! 66: specification for ! implies that "3!/hello/p" works, and it never
! 67: has, historically. Note,
! 68:
! 69: 3!{
! 70: /hello/p
! 71: }
! 72:
! 73: does work.
! 74:
! 75: 7. POSIX does not specify what happens with consecutive ! commands
! 76: (e.g. /foo/!!!p). Historic implementations allow any number of
! 77: !'s without changing the behaviour. (It seems logical that each
! 78: one might reverse the behaviour.) This implementation follows
! 79: historic practice.
! 80:
! 81: 8. Historic versions of sed permitted commands to be separated
! 82: by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first
! 83: three lines of a file. This is not specified by POSIX.
! 84: Note, the ; command separator is not allowed for the commands
! 85: a, c, i, w, r, :, b, t, # and at the end of a w flag in the s
! 86: command. This implementation follows historic practice and
! 87: implements the ; separator.
! 88:
! 89: 9. Historic versions of sed terminated the script if EOF was reached
! 90: during the execution of the 'n' command, i.e.:
! 91:
! 92: sed -e '
! 93: n
! 94: i\
! 95: hello
! 96: ' </dev/null
! 97:
! 98: did not produce any output. POSIX does not specify this behavior.
! 99: This implementation follows historic practice.
! 100:
! 101: 10. Deleted.
! 102:
! 103: 11. Historical implementations do not output the change text of a c
! 104: command in the case of an address range whose first line number
! 105: is greater than the second (e.g. 3,1). POSIX requires that the
! 106: text be output. Since the historic behavior doesn't seem to have
! 107: any particular purpose, this implementation follows the POSIX
! 108: behavior.
! 109:
! 110: 12. POSIX does not specify whether address ranges are checked and
! 111: reset if a command is not executed due to a jump. The following
! 112: program will behave in different ways depending on whether the
! 113: 'c' command is triggered at the third line, i.e. will the text
! 114: be output even though line 3 of the input will never logically
! 115: encounter that command.
! 116:
! 117: 2,4b
! 118: 1,3c\
! 119: text
! 120:
! 121: Historic implementations, and this implementation, do not output
! 122: the text in the above example. The general rule, therefore,
! 123: is that a range whose second address is never matched extends to
! 124: the end of the input.
! 125:
! 126: 13. Historical implementations allow an output suppressing #n at the
! 127: beginning of -e arguments as well as in a script file. POSIX
! 128: does not specify this. This implementation follows historical
! 129: practice.
! 130:
! 131: 14. POSIX does not explicitly specify how sed behaves if no script is
! 132: specified. Since the sed Synopsis permits this form of the command,
! 133: and the language in the Description section states that the input
! 134: is output, it seems reasonable that it behave like the cat(1)
! 135: command. Historic sed implementations behave differently for "ls |
! 136: sed", where they produce no output, and "ls | sed -e#", where they
! 137: behave like cat. This implementation behaves like cat in both cases.
! 138:
! 139: 15. The POSIX requirement to open all w files at the beginning makes
! 140: sed behave nonintuitively when the w commands are preceded by
! 141: addresses or are within conditional blocks. This implementation
! 142: follows historic practice and POSIX, by default, and provides the
! 143: -a option which opens the files only when they are needed.
! 144:
! 145: 16. POSIX does not specify how escape sequences other than \n and \D
! 146: (where D is the delimiter character) are to be treated. This is
! 147: reasonable, however, it also doesn't state that the backslash is
! 148: to be discarded from the output regardless. A strict reading of
! 149: POSIX would be that "echo xyz | sed s/./\a" would display "\ayz".
! 150: As historic sed implementations always discarded the backslash,
! 151: this implementation does as well.
! 152:
! 153: 17. POSIX specifies that an address can be "empty". This implies
! 154: that constructs like ",d" or "1,d" and ",5d" are allowed. This
! 155: is not true for historic implementations or this implementation
! 156: of sed.
! 157:
! 158: 18. The b t and : commands are documented in POSIX to ignore leading
! 159: white space, but no mention is made of trailing white space.
! 160: Historic implementations of sed assigned different locations to
! 161: the labels "x" and "x ". This is not useful, and leads to subtle
! 162: programming errors, but it is historic practice and changing it
! 163: could theoretically break working scripts. This implementation
! 164: follows historic practice.
! 165:
! 166: 19. Although POSIX specifies that reading from files that do not exist
! 167: from within the script must not terminate the script, it does not
! 168: specify what happens if a write command fails. Historic practice
! 169: is to fail immediately if the file cannot be opened or written.
! 170: This implementation follows historic practice.
! 171:
! 172: 20. Historic practice is that the \n construct can be used for either
! 173: string1 or string2 of the y command. This is not specified by
! 174: POSIX. This implementation follows historic practice.
! 175:
! 176: 21. Deleted.
! 177:
! 178: 22. Historic implementations of sed ignore the RE delimiter characters
! 179: within character classes. This is not specified in POSIX. This
! 180: implementation follows historic practice.
! 181:
! 182: 23. Historic implementations handle empty RE's in a special way: the
! 183: empty RE is interpreted as if it were the last RE encountered,
! 184: whether in an address or elsewhere. POSIX does not document this
! 185: behavior. For example the command:
! 186:
! 187: sed -e /abc/s//XXX/
! 188:
! 189: substitutes XXX for the pattern abc. The semantics of "the last
! 190: RE" can be defined in two different ways:
! 191:
! 192: 1. The last RE encountered when compiling (lexical/static scope).
! 193: 2. The last RE encountered while running (dynamic scope).
! 194:
! 195: While many historical implementations fail on programs depending
! 196: on scope differences, the SunOS version exhibited dynamic scope
! 197: behaviour. This implementation does dynamic scoping, as this seems
! 198: the most useful and in order to remain consistent with historical
! 199: practice.