Annotation of src/usr.bin/tr/tr.1, Revision 1.24
1.24 ! schwarze 1: .\" $OpenBSD: tr.1,v 1.23 2014/12/09 14:39:37 jmc Exp $
1.1 deraadt 2: .\" $NetBSD: tr.1,v 1.5 1994/12/07 08:35:13 jtc Exp $
3: .\"
4: .\" Copyright (c) 1991, 1993
5: .\" The Regents of the University of California. All rights reserved.
6: .\"
7: .\" This code is derived from software contributed to Berkeley by
8: .\" the Institute of Electrical and Electronics Engineers, Inc.
9: .\"
10: .\" Redistribution and use in source and binary forms, with or without
11: .\" modification, are permitted provided that the following conditions
12: .\" are met:
13: .\" 1. Redistributions of source code must retain the above copyright
14: .\" notice, this list of conditions and the following disclaimer.
15: .\" 2. Redistributions in binary form must reproduce the above copyright
16: .\" notice, this list of conditions and the following disclaimer in the
17: .\" documentation and/or other materials provided with the distribution.
1.8 millert 18: .\" 3. Neither the name of the University nor the names of its contributors
1.1 deraadt 19: .\" may be used to endorse or promote products derived from this software
20: .\" without specific prior written permission.
21: .\"
22: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
23: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
24: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
25: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
26: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
27: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
28: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
29: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
30: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
31: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
32: .\" SUCH DAMAGE.
33: .\"
34: .\" @(#)tr.1 8.1 (Berkeley) 6/6/93
35: .\"
1.24 ! schwarze 36: .Dd $Mdocdate: December 9 2014 $
1.1 deraadt 37: .Dt TR 1
38: .Os
39: .Sh NAME
40: .Nm tr
41: .Nd translate characters
42: .Sh SYNOPSIS
43: .Nm tr
1.21 millert 44: .Op Fl Ccs
1.1 deraadt 45: .Ar string1 string2
46: .Nm tr
1.21 millert 47: .Op Fl Cc
1.1 deraadt 48: .Fl d
49: .Ar string1
50: .Nm tr
1.21 millert 51: .Op Fl Cc
1.1 deraadt 52: .Fl s
53: .Ar string1
54: .Nm tr
1.21 millert 55: .Op Fl Cc
1.1 deraadt 56: .Fl ds
57: .Ar string1 string2
58: .Sh DESCRIPTION
59: The
1.6 aaron 60: .Nm
1.1 deraadt 61: utility copies the standard input to the standard output with substitution
62: or deletion of selected characters.
63: .Pp
1.5 aaron 64: The options are as follows:
1.1 deraadt 65: .Bl -tag -width Ds
1.21 millert 66: .It Fl C
1.1 deraadt 67: Complements the set of characters in
1.4 pjanzen 68: .Ar string1 ;
69: for instance,
1.21 millert 70: .Dq -C\ ab
1.4 pjanzen 71: includes every character except for
1.22 jmc 72: .Sq a
1.4 pjanzen 73: and
1.22 jmc 74: .Sq b .
1.21 millert 75: .It Fl c
76: The same as
77: .Fl C .
1.1 deraadt 78: .It Fl d
79: The
80: .Fl d
81: option causes characters to be deleted from the input.
82: .It Fl s
83: The
84: .Fl s
85: option squeezes multiple occurrences of the characters listed in the last
86: operand (either
87: .Ar string1
88: or
89: .Ar string2 )
90: in the input into a single instance of the character.
91: This occurs after all deletion and translation is completed.
92: .El
93: .Pp
94: In the first synopsis form, the characters in
95: .Ar string1
96: are translated into the characters in
97: .Ar string2
98: where the first character in
99: .Ar string1
100: is translated into the first character in
101: .Ar string2
102: and so on.
103: If
104: .Ar string1
105: is longer than
106: .Ar string2 ,
107: the last character found in
108: .Ar string2
109: is duplicated until
110: .Ar string1
111: is exhausted.
112: .Pp
113: In the second synopsis form, the characters in
114: .Ar string1
115: are deleted from the input.
116: .Pp
117: In the third synopsis form, the characters in
118: .Ar string1
119: are compressed as described for the
120: .Fl s
121: option.
122: .Pp
123: In the fourth synopsis form, the characters in
124: .Ar string1
125: are deleted from the input, and the characters in
126: .Ar string2
127: are compressed as described for the
128: .Fl s
129: option.
130: .Pp
131: The following conventions can be used in
132: .Ar string1
133: and
134: .Ar string2
135: to specify sets of characters:
136: .Bl -tag -width [:equiv:]
137: .It character
138: Any character not described by one of the following conventions
139: represents itself.
140: .It \eoctal
1.4 pjanzen 141: A backslash followed by 1, 2, or 3 octal digits represents a character
1.1 deraadt 142: with that encoded value.
143: To follow an octal sequence with a digit as a character, left zero-pad
144: the octal sequence to the full 3 octal digits.
145: .It \echaracter
146: A backslash followed by certain special characters maps to special
147: values.
1.6 aaron 148: .Pp
1.19 jmc 149: .Bl -tag -width "nn" -offset indent -compact
150: .It \ea
151: <alert character>
152: .It \eb
153: <backspace>
154: .It \ef
155: <form-feed>
156: .It \en
157: <newline>
158: .It \er
159: <carriage return>
160: .It \et
161: <tab>
162: .It \ev
163: <vertical tab>
1.1 deraadt 164: .El
1.6 aaron 165: .Pp
1.1 deraadt 166: A backslash followed by any other character maps to that character.
167: .It c-c
168: Represents the range of characters between the range endpoints, inclusively.
169: .It [:class:]
170: Represents all characters belonging to the defined character class.
171: Class names are:
1.6 aaron 172: .Pp
1.19 jmc 173: .Bl -tag -width "xdigit" -offset indent -compact
174: .It alnum
175: <alphanumeric characters>
176: .It alpha
177: <alphabetic characters>
178: .It blank
179: <blank characters>
180: .It cntrl
181: <control characters>
182: .It digit
183: <numeric characters>
184: .It graph
185: <graphic characters>
186: .It lower
187: <lower-case alphabetic characters>
188: .It print
189: <printable characters>
190: .It punct
191: <punctuation characters>
192: .It space
193: <space characters>
194: .It upper
195: <upper-case characters>
196: .It xdigit
197: <hexadecimal characters>
1.1 deraadt 198: .El
199: .Pp
1.15 deraadt 200: .\" All classes may be used in
201: .\" .Ar string1 ,
202: .\" and in
203: .\" .Ar string2
204: .\" when both the
205: .\" .Fl d
206: .\" and
207: .\" .Fl s
208: .\" options are specified.
209: .\" Otherwise, only the classes ``upper'' and ``lower'' may be used in
210: .\" .Ar string2
211: .\" and then only when the corresponding class (``upper'' for ``lower''
212: .\" and vice-versa) is specified in the same relative position in
213: .\" .Ar string1 .
214: .\" .Pp
1.4 pjanzen 215: With the exception of the
216: .Dq upper
217: and
218: .Dq lower
219: classes, characters
1.1 deraadt 220: in the classes are in unspecified order.
1.4 pjanzen 221: In the
222: .Dq upper
223: and
224: .Dq lower
225: classes, characters are entered in
1.1 deraadt 226: ascending order.
1.24 ! schwarze 227: .Pp
! 228: For specific information as to which ASCII characters are included
! 229: in these classes, see
! 230: .Xr isalnum 3 ,
! 231: .Xr isalpha 3 ,
! 232: and related manual pages.
1.1 deraadt 233: .It [=equiv=]
234: Represents all characters or collating (sorting) elements belonging to
235: the same equivalence class as
236: .Ar equiv .
237: If
238: there is a secondary ordering within the equivalence class, the characters
239: are ordered in ascending sequence.
1.4 pjanzen 240: Otherwise, they are ordered after their encoded values.
241: An example of an equivalence class might be
242: .Dq c
243: and
244: .Dq ch
245: in Spanish;
1.1 deraadt 246: English has no equivalence classes.
247: .It [#*n]
248: Represents
249: .Ar n
250: repeated occurrences of the character represented by
251: .Ar # .
252: This
253: expression is only valid when it occurs in
254: .Ar string2 .
255: If
256: .Ar n
1.18 jmc 257: is omitted or is zero, it is interpreted as large enough to extend the
1.1 deraadt 258: .Ar string2
259: sequence to the length of
260: .Ar string1 .
261: If
262: .Ar n
1.4 pjanzen 263: has a leading zero, it is interpreted as an octal value; otherwise,
1.1 deraadt 264: it's interpreted as a decimal value.
265: .El
1.17 jmc 266: .Sh EXIT STATUS
1.12 sobrado 267: .Ex -std tr
1.1 deraadt 268: .Sh EXAMPLES
269: The following examples are shown as given to the shell:
1.6 aaron 270: .Pp
1.1 deraadt 271: Create a list of the words in file1, one per line, where a word is taken to
272: be a maximal string of letters.
1.6 aaron 273: .Pp
1.22 jmc 274: .Dl $ tr -cs \*q[:alpha:]\*q \*q\en\*q < file1
1.6 aaron 275: .Pp
1.1 deraadt 276: Translate the contents of file1 to upper-case.
1.6 aaron 277: .Pp
1.22 jmc 278: .Dl $ tr \*q[:lower:]\*q \*q[:upper:]\*q < file1
1.6 aaron 279: .Pp
1.1 deraadt 280: Strip out non-printable characters from file1.
1.6 aaron 281: .Pp
1.22 jmc 282: .Dl $ tr -cd \*q[:print:]\*q < file1
1.6 aaron 283: .Sh SEE ALSO
284: .Xr sed 1
1.9 jmc 285: .Sh STANDARDS
1.10 jmc 286: The
287: .Nm
288: utility is compliant with the
1.13 jmc 289: .St -p1003.1-2008
1.21 millert 290: specification,
291: except that the
292: .Fl C
293: option behaves the same as the
294: .Fl c
295: option since
296: .Nm
297: is not locale-aware.
1.10 jmc 298: .Pp
1.1 deraadt 299: System V has historically implemented character ranges using the syntax
1.4 pjanzen 300: .Dq [c-c]
301: instead of the
302: .Dq c-c
1.20 jmc 303: used by historic
304: .Bx
305: implementations and
1.1 deraadt 306: standardized by POSIX.
307: System V shell scripts should work under this implementation as long as
1.6 aaron 308: the range is intended to map in another range, i.e., the command
1.22 jmc 309: .Dq tr [a-z] [A-Z]
1.4 pjanzen 310: will work as it will map the
1.22 jmc 311: .Sq \&[
1.4 pjanzen 312: character in
313: .Ar string1
314: to the
1.22 jmc 315: .Sq \&[
1.4 pjanzen 316: character in
1.3 aaron 317: .Ar string2 .
1.1 deraadt 318: However, if the shell script is deleting or squeezing characters as in
1.4 pjanzen 319: the command
320: .Dq tr\ -d\ [a-z] ,
321: the characters
1.22 jmc 322: .Sq \&[
1.4 pjanzen 323: and
1.22 jmc 324: .Sq \&]
1.4 pjanzen 325: will be
326: included in the deletion or compression list, which would not have happened
1.1 deraadt 327: under an historic System V implementation.
1.4 pjanzen 328: Additionally, any scripts that depended on the sequence
329: .Dq a-z
330: to represent the three characters
1.22 jmc 331: .Sq a ,
332: .Sq - ,
1.4 pjanzen 333: and
1.22 jmc 334: .Sq z
1.4 pjanzen 335: will have to be rewritten as
336: .Dq a\e-z .
1.1 deraadt 337: .Pp
338: The
1.6 aaron 339: .Nm
1.1 deraadt 340: utility has historically not permitted the manipulation of NUL bytes in
1.4 pjanzen 341: its input and, additionally, has stripped NUL's from its input stream.
1.1 deraadt 342: This implementation has removed this behavior as a bug.
343: .Pp
344: The
1.6 aaron 345: .Nm
1.4 pjanzen 346: utility has historically been extremely forgiving of syntax errors:
1.1 deraadt 347: for example, the
348: .Fl c
349: and
350: .Fl s
351: options were ignored unless two strings were specified.
352: This implementation will not permit illegal syntax.
1.9 jmc 353: .Pp
1.1 deraadt 354: It should be noted that the feature wherein the last character of
355: .Ar string2
356: is duplicated if
357: .Ar string2
358: has less characters than
359: .Ar string1
360: is permitted by POSIX but is not required.
361: Shell scripts attempting to be portable to other POSIX systems should use
1.4 pjanzen 362: the
363: .Dq [#*]
364: convention instead of relying on this behavior.