Annotation of src/usr.bin/sed/POSIX, Revision 1.2
1.2 ! deraadt 1: # $OpenBSD: POSIX,v 1.1.1.1 1995/10/18 08:46:05 deraadt Exp $
1.1 deraadt 2: # from: @(#)POSIX 8.1 (Berkeley) 6/6/93
3:
4: Comments on the IEEE P1003.2 Draft 12
5: Part 2: Shell and Utilities
6: Section 4.55: sed - Stream editor
7:
8: Diomidis Spinellis <dds@doc.ic.ac.uk>
9: Keith Bostic <bostic@cs.berkeley.edu>
10:
11: In the following paragraphs, "wrong" usually means "inconsistent with
12: historic practice", as most of the following comments refer to
13: undocumented inconsistencies between the historical versions of sed and
14: the POSIX 1003.2 standard. All the comments are notes taken while
15: implementing a POSIX-compatible version of sed, and should not be
16: interpreted as official opinions or criticism towards the POSIX committee.
17: All uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2.
18:
19: 1. 32V and BSD derived implementations of sed strip the text
20: arguments of the a, c and i commands of their initial blanks,
21: i.e.
22:
23: #!/bin/sed -f
24: a\
25: foo\
26: \ indent\
27: bar
28:
29: produces:
30:
31: foo
32: indent
33: bar
34:
35: POSIX does not specify this behavior as the System V versions of
36: sed do not do this stripping. The argument against stripping is
37: that it is difficult to write sed scripts that have leading blanks
38: if they are stripped. The argument for stripping is that it is
39: difficult to write readable sed scripts unless indentation is allowed
40: and ignored, and leading whitespace is obtainable by entering a
41: backslash in front of it. This implementation follows the BSD
42: historic practice.
43:
44: 2. Historical versions of sed required that the w flag be the last
45: flag to an s command as it takes an additional argument. This
46: is obvious, but not specified in POSIX.
47:
48: 3. Historical versions of sed required that whitespace follow a w
49: flag to an s command. This is not specified in POSIX. This
50: implementation permits whitespace but does not require it.
51:
52: 4. Historical versions of sed permitted any number of whitespace
53: characters to follow the w command. This is not specified in
54: POSIX. This implementation permits whitespace but does not
55: require it.
56:
57: 5. The rule for the l command differs from historic practice. Table
58: 2-15 includes the various ANSI C escape sequences, including \\
59: for backslash. Some historical versions of sed displayed two
60: digit octal numbers, too, not three as specified by POSIX. POSIX
61: is a cleanup, and is followed by this implementation.
62:
63: 6. The POSIX specification for ! does not specify that for a single
64: command the command must not contain an address specification
65: whereas the command list can contain address specifications. The
66: specification for ! implies that "3!/hello/p" works, and it never
67: has, historically. Note,
68:
69: 3!{
70: /hello/p
71: }
72:
73: does work.
74:
75: 7. POSIX does not specify what happens with consecutive ! commands
76: (e.g. /foo/!!!p). Historic implementations allow any number of
77: !'s without changing the behaviour. (It seems logical that each
78: one might reverse the behaviour.) This implementation follows
79: historic practice.
80:
81: 8. Historic versions of sed permitted commands to be separated
82: by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first
83: three lines of a file. This is not specified by POSIX.
84: Note, the ; command separator is not allowed for the commands
85: a, c, i, w, r, :, b, t, # and at the end of a w flag in the s
86: command. This implementation follows historic practice and
87: implements the ; separator.
88:
89: 9. Historic versions of sed terminated the script if EOF was reached
90: during the execution of the 'n' command, i.e.:
91:
92: sed -e '
93: n
94: i\
95: hello
96: ' </dev/null
97:
98: did not produce any output. POSIX does not specify this behavior.
99: This implementation follows historic practice.
100:
101: 10. Deleted.
102:
103: 11. Historical implementations do not output the change text of a c
104: command in the case of an address range whose first line number
105: is greater than the second (e.g. 3,1). POSIX requires that the
106: text be output. Since the historic behavior doesn't seem to have
107: any particular purpose, this implementation follows the POSIX
108: behavior.
109:
110: 12. POSIX does not specify whether address ranges are checked and
111: reset if a command is not executed due to a jump. The following
112: program will behave in different ways depending on whether the
113: 'c' command is triggered at the third line, i.e. will the text
114: be output even though line 3 of the input will never logically
115: encounter that command.
116:
117: 2,4b
118: 1,3c\
119: text
120:
121: Historic implementations, and this implementation, do not output
122: the text in the above example. The general rule, therefore,
123: is that a range whose second address is never matched extends to
124: the end of the input.
125:
126: 13. Historical implementations allow an output suppressing #n at the
127: beginning of -e arguments as well as in a script file. POSIX
128: does not specify this. This implementation follows historical
129: practice.
130:
131: 14. POSIX does not explicitly specify how sed behaves if no script is
132: specified. Since the sed Synopsis permits this form of the command,
133: and the language in the Description section states that the input
134: is output, it seems reasonable that it behave like the cat(1)
135: command. Historic sed implementations behave differently for "ls |
136: sed", where they produce no output, and "ls | sed -e#", where they
137: behave like cat. This implementation behaves like cat in both cases.
138:
139: 15. The POSIX requirement to open all w files at the beginning makes
140: sed behave nonintuitively when the w commands are preceded by
141: addresses or are within conditional blocks. This implementation
142: follows historic practice and POSIX, by default, and provides the
143: -a option which opens the files only when they are needed.
144:
145: 16. POSIX does not specify how escape sequences other than \n and \D
146: (where D is the delimiter character) are to be treated. This is
147: reasonable, however, it also doesn't state that the backslash is
148: to be discarded from the output regardless. A strict reading of
149: POSIX would be that "echo xyz | sed s/./\a" would display "\ayz".
150: As historic sed implementations always discarded the backslash,
151: this implementation does as well.
152:
153: 17. POSIX specifies that an address can be "empty". This implies
154: that constructs like ",d" or "1,d" and ",5d" are allowed. This
155: is not true for historic implementations or this implementation
156: of sed.
157:
158: 18. The b t and : commands are documented in POSIX to ignore leading
159: white space, but no mention is made of trailing white space.
160: Historic implementations of sed assigned different locations to
161: the labels "x" and "x ". This is not useful, and leads to subtle
162: programming errors, but it is historic practice and changing it
163: could theoretically break working scripts. This implementation
164: follows historic practice.
165:
166: 19. Although POSIX specifies that reading from files that do not exist
167: from within the script must not terminate the script, it does not
168: specify what happens if a write command fails. Historic practice
169: is to fail immediately if the file cannot be opened or written.
170: This implementation follows historic practice.
171:
172: 20. Historic practice is that the \n construct can be used for either
173: string1 or string2 of the y command. This is not specified by
174: POSIX. This implementation follows historic practice.
175:
176: 21. Deleted.
177:
178: 22. Historic implementations of sed ignore the RE delimiter characters
179: within character classes. This is not specified in POSIX. This
180: implementation follows historic practice.
181:
182: 23. Historic implementations handle empty RE's in a special way: the
183: empty RE is interpreted as if it were the last RE encountered,
184: whether in an address or elsewhere. POSIX does not document this
185: behavior. For example the command:
186:
187: sed -e /abc/s//XXX/
188:
189: substitutes XXX for the pattern abc. The semantics of "the last
190: RE" can be defined in two different ways:
191:
192: 1. The last RE encountered when compiling (lexical/static scope).
193: 2. The last RE encountered while running (dynamic scope).
194:
195: While many historical implementations fail on programs depending
196: on scope differences, the SunOS version exhibited dynamic scope
197: behaviour. This implementation does dynamic scoping, as this seems
198: the most useful and in order to remain consistent with historical
199: practice.