Annotation of src/usr.bin/tr/tr.1, Revision 1.20
1.20 ! jmc 1: .\" $OpenBSD: tr.1,v 1.19 2011/09/03 22:59:07 jmc Exp $
1.1 deraadt 2: .\" $NetBSD: tr.1,v 1.5 1994/12/07 08:35:13 jtc Exp $
3: .\"
4: .\" Copyright (c) 1991, 1993
5: .\" The Regents of the University of California. All rights reserved.
6: .\"
7: .\" This code is derived from software contributed to Berkeley by
8: .\" the Institute of Electrical and Electronics Engineers, Inc.
9: .\"
10: .\" Redistribution and use in source and binary forms, with or without
11: .\" modification, are permitted provided that the following conditions
12: .\" are met:
13: .\" 1. Redistributions of source code must retain the above copyright
14: .\" notice, this list of conditions and the following disclaimer.
15: .\" 2. Redistributions in binary form must reproduce the above copyright
16: .\" notice, this list of conditions and the following disclaimer in the
17: .\" documentation and/or other materials provided with the distribution.
1.8 millert 18: .\" 3. Neither the name of the University nor the names of its contributors
1.1 deraadt 19: .\" may be used to endorse or promote products derived from this software
20: .\" without specific prior written permission.
21: .\"
22: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
23: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
24: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
25: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
26: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
27: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
28: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
29: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
30: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
31: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
32: .\" SUCH DAMAGE.
33: .\"
34: .\" @(#)tr.1 8.1 (Berkeley) 6/6/93
35: .\"
1.20 ! jmc 36: .Dd $Mdocdate: September 3 2011 $
1.1 deraadt 37: .Dt TR 1
38: .Os
39: .Sh NAME
40: .Nm tr
41: .Nd translate characters
42: .Sh SYNOPSIS
43: .Nm tr
44: .Op Fl cs
45: .Ar string1 string2
46: .Nm tr
47: .Op Fl c
48: .Fl d
49: .Ar string1
50: .Nm tr
51: .Op Fl c
52: .Fl s
53: .Ar string1
54: .Nm tr
55: .Op Fl c
56: .Fl ds
57: .Ar string1 string2
58: .Sh DESCRIPTION
59: The
1.6 aaron 60: .Nm
1.1 deraadt 61: utility copies the standard input to the standard output with substitution
62: or deletion of selected characters.
63: .Pp
1.5 aaron 64: The options are as follows:
1.1 deraadt 65: .Bl -tag -width Ds
66: .It Fl c
67: Complements the set of characters in
1.4 pjanzen 68: .Ar string1 ;
69: for instance,
70: .Dq -c\ ab
71: includes every character except for
72: .Dq a
73: and
74: .Dq b .
1.1 deraadt 75: .It Fl d
76: The
77: .Fl d
78: option causes characters to be deleted from the input.
79: .It Fl s
80: The
81: .Fl s
82: option squeezes multiple occurrences of the characters listed in the last
83: operand (either
84: .Ar string1
85: or
86: .Ar string2 )
87: in the input into a single instance of the character.
88: This occurs after all deletion and translation is completed.
89: .El
90: .Pp
91: In the first synopsis form, the characters in
92: .Ar string1
93: are translated into the characters in
94: .Ar string2
95: where the first character in
96: .Ar string1
97: is translated into the first character in
98: .Ar string2
99: and so on.
100: If
101: .Ar string1
102: is longer than
103: .Ar string2 ,
104: the last character found in
105: .Ar string2
106: is duplicated until
107: .Ar string1
108: is exhausted.
109: .Pp
110: In the second synopsis form, the characters in
111: .Ar string1
112: are deleted from the input.
113: .Pp
114: In the third synopsis form, the characters in
115: .Ar string1
116: are compressed as described for the
117: .Fl s
118: option.
119: .Pp
120: In the fourth synopsis form, the characters in
121: .Ar string1
122: are deleted from the input, and the characters in
123: .Ar string2
124: are compressed as described for the
125: .Fl s
126: option.
127: .Pp
128: The following conventions can be used in
129: .Ar string1
130: and
131: .Ar string2
132: to specify sets of characters:
133: .Bl -tag -width [:equiv:]
134: .It character
135: Any character not described by one of the following conventions
136: represents itself.
137: .It \eoctal
1.4 pjanzen 138: A backslash followed by 1, 2, or 3 octal digits represents a character
1.1 deraadt 139: with that encoded value.
140: To follow an octal sequence with a digit as a character, left zero-pad
141: the octal sequence to the full 3 octal digits.
142: .It \echaracter
143: A backslash followed by certain special characters maps to special
144: values.
1.6 aaron 145: .Pp
1.19 jmc 146: .Bl -tag -width "nn" -offset indent -compact
147: .It \ea
148: <alert character>
149: .It \eb
150: <backspace>
151: .It \ef
152: <form-feed>
153: .It \en
154: <newline>
155: .It \er
156: <carriage return>
157: .It \et
158: <tab>
159: .It \ev
160: <vertical tab>
1.1 deraadt 161: .El
1.6 aaron 162: .Pp
1.1 deraadt 163: A backslash followed by any other character maps to that character.
164: .It c-c
165: Represents the range of characters between the range endpoints, inclusively.
166: .It [:class:]
167: Represents all characters belonging to the defined character class.
168: Class names are:
1.6 aaron 169: .Pp
1.19 jmc 170: .Bl -tag -width "xdigit" -offset indent -compact
171: .It alnum
172: <alphanumeric characters>
173: .It alpha
174: <alphabetic characters>
175: .It blank
176: <blank characters>
177: .It cntrl
178: <control characters>
179: .It digit
180: <numeric characters>
181: .It graph
182: <graphic characters>
183: .It lower
184: <lower-case alphabetic characters>
185: .It print
186: <printable characters>
187: .It punct
188: <punctuation characters>
189: .It space
190: <space characters>
191: .It upper
192: <upper-case characters>
193: .It xdigit
194: <hexadecimal characters>
1.1 deraadt 195: .El
196: .Pp
1.15 deraadt 197: .\" All classes may be used in
198: .\" .Ar string1 ,
199: .\" and in
200: .\" .Ar string2
201: .\" when both the
202: .\" .Fl d
203: .\" and
204: .\" .Fl s
205: .\" options are specified.
206: .\" Otherwise, only the classes ``upper'' and ``lower'' may be used in
207: .\" .Ar string2
208: .\" and then only when the corresponding class (``upper'' for ``lower''
209: .\" and vice-versa) is specified in the same relative position in
210: .\" .Ar string1 .
211: .\" .Pp
1.4 pjanzen 212: With the exception of the
213: .Dq upper
214: and
215: .Dq lower
216: classes, characters
1.1 deraadt 217: in the classes are in unspecified order.
1.4 pjanzen 218: In the
219: .Dq upper
220: and
221: .Dq lower
222: classes, characters are entered in
1.1 deraadt 223: ascending order.
224: .Pp
225: For specific information as to which ASCII characters are included
226: in these classes, see
227: .Xr ctype 3
228: and related manual pages.
229: .It [=equiv=]
230: Represents all characters or collating (sorting) elements belonging to
231: the same equivalence class as
232: .Ar equiv .
233: If
234: there is a secondary ordering within the equivalence class, the characters
235: are ordered in ascending sequence.
1.4 pjanzen 236: Otherwise, they are ordered after their encoded values.
237: An example of an equivalence class might be
238: .Dq c
239: and
240: .Dq ch
241: in Spanish;
1.1 deraadt 242: English has no equivalence classes.
243: .It [#*n]
244: Represents
245: .Ar n
246: repeated occurrences of the character represented by
247: .Ar # .
248: This
249: expression is only valid when it occurs in
250: .Ar string2 .
251: If
252: .Ar n
1.18 jmc 253: is omitted or is zero, it is interpreted as large enough to extend the
1.1 deraadt 254: .Ar string2
255: sequence to the length of
256: .Ar string1 .
257: If
258: .Ar n
1.4 pjanzen 259: has a leading zero, it is interpreted as an octal value; otherwise,
1.1 deraadt 260: it's interpreted as a decimal value.
261: .El
1.17 jmc 262: .Sh EXIT STATUS
1.12 sobrado 263: .Ex -std tr
1.1 deraadt 264: .Sh EXAMPLES
265: The following examples are shown as given to the shell:
1.6 aaron 266: .Pp
1.1 deraadt 267: Create a list of the words in file1, one per line, where a word is taken to
268: be a maximal string of letters.
1.6 aaron 269: .Pp
1.7 deraadt 270: .D1 Li "$ tr -cs \*q[:alpha:]\*q \*q\en\*q < file1"
1.6 aaron 271: .Pp
1.1 deraadt 272: Translate the contents of file1 to upper-case.
1.6 aaron 273: .Pp
1.7 deraadt 274: .D1 Li "$ tr \*q[:lower:]\*q \*q[:upper:]\*q < file1"
1.6 aaron 275: .Pp
1.1 deraadt 276: Strip out non-printable characters from file1.
1.6 aaron 277: .Pp
1.7 deraadt 278: .D1 Li "$ tr -cd \*q[:print:]\*q < file1"
1.6 aaron 279: .Sh SEE ALSO
280: .Xr sed 1
1.9 jmc 281: .Sh STANDARDS
1.10 jmc 282: The
283: .Nm
284: utility is compliant with the
1.13 jmc 285: .St -p1003.1-2008
1.10 jmc 286: specification.
287: .Pp
1.1 deraadt 288: System V has historically implemented character ranges using the syntax
1.4 pjanzen 289: .Dq [c-c]
290: instead of the
291: .Dq c-c
1.20 ! jmc 292: used by historic
! 293: .Bx
! 294: implementations and
1.1 deraadt 295: standardized by POSIX.
296: System V shell scripts should work under this implementation as long as
1.6 aaron 297: the range is intended to map in another range, i.e., the command
1.4 pjanzen 298: .Dq tr\ [a-z]\ [A-Z]
299: will work as it will map the
1.16 schwarze 300: .Dq \&[
1.4 pjanzen 301: character in
302: .Ar string1
303: to the
1.16 schwarze 304: .Dq \&[
1.4 pjanzen 305: character in
1.3 aaron 306: .Ar string2 .
1.1 deraadt 307: However, if the shell script is deleting or squeezing characters as in
1.4 pjanzen 308: the command
309: .Dq tr\ -d\ [a-z] ,
310: the characters
1.16 schwarze 311: .Dq \&[
1.4 pjanzen 312: and
313: .Dq \]
314: will be
315: included in the deletion or compression list, which would not have happened
1.1 deraadt 316: under an historic System V implementation.
1.4 pjanzen 317: Additionally, any scripts that depended on the sequence
318: .Dq a-z
319: to represent the three characters
320: .Dq a ,
321: .Dq - ,
322: and
323: .Dq z
324: will have to be rewritten as
325: .Dq a\e-z .
1.1 deraadt 326: .Pp
327: The
1.6 aaron 328: .Nm
1.1 deraadt 329: utility has historically not permitted the manipulation of NUL bytes in
1.4 pjanzen 330: its input and, additionally, has stripped NUL's from its input stream.
1.1 deraadt 331: This implementation has removed this behavior as a bug.
332: .Pp
333: The
1.6 aaron 334: .Nm
1.4 pjanzen 335: utility has historically been extremely forgiving of syntax errors:
1.1 deraadt 336: for example, the
337: .Fl c
338: and
339: .Fl s
340: options were ignored unless two strings were specified.
341: This implementation will not permit illegal syntax.
1.9 jmc 342: .Pp
1.1 deraadt 343: It should be noted that the feature wherein the last character of
344: .Ar string2
345: is duplicated if
346: .Ar string2
347: has less characters than
348: .Ar string1
349: is permitted by POSIX but is not required.
350: Shell scripts attempting to be portable to other POSIX systems should use
1.4 pjanzen 351: the
352: .Dq [#*]
353: convention instead of relying on this behavior.