Annotation of src/usr.bin/sort/sort.1, Revision 1.9
1.9 ! aaron 1: .\" $OpenBSD: sort.1,v 1.8 2000/03/11 21:40:03 aaron Exp $
1.1 millert 2: .\"
3: .\" Copyright (c) 1991, 1993
4: .\" The Regents of the University of California. All rights reserved.
5: .\"
6: .\" This code is derived from software contributed to Berkeley by
7: .\" the Institute of Electrical and Electronics Engineers, Inc.
8: .\"
9: .\" Redistribution and use in source and binary forms, with or without
10: .\" modification, are permitted provided that the following conditions
11: .\" are met:
12: .\" 1. Redistributions of source code must retain the above copyright
13: .\" notice, this list of conditions and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
17: .\" 3. All advertising materials mentioning features or use of this software
18: .\" must display the following acknowledgement:
19: .\" This product includes software developed by the University of
20: .\" California, Berkeley and its contributors.
21: .\" 4. Neither the name of the University nor the names of its contributors
22: .\" may be used to endorse or promote products derived from this software
23: .\" without specific prior written permission.
24: .\"
25: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
26: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
27: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
28: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
29: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
30: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
31: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
32: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
33: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
34: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
35: .\" SUCH DAMAGE.
36: .\"
37: .\" @(#)sort.1 8.1 (Berkeley) 6/6/93
38: .\"
39: .Dd June 6, 1993
40: .Dt SORT 1
41: .Os
42: .Sh NAME
43: .Nm sort
44: .Nd sort or merge text files
45: .Sh SYNOPSIS
46: .Nm sort
1.2 deraadt 47: .Op Fl cmubdfinrH
1.1 millert 48: .Op Fl t Ar char
49: .Op Fl R Ar char
50: .Oo
51: .Cm Fl k Ar field1[,field2]
52: .Oc
53: .Ar ...
54: .Op Fl T Ar dir
55: .Op Fl o Ar output
56: .Op Ar file
57: .Ar ...
58: .Sh DESCRIPTION
59: The
1.8 aaron 60: .Nm
1.1 millert 61: utility
62: sorts text files by lines.
63: Comparisons are based on one or more sort keys extracted
1.8 aaron 64: from each line of input, and are performed lexicographically.
65: By default, if keys are not given,
66: .Nm
1.1 millert 67: regards each input line as a single field.
68: .Pp
1.7 aaron 69: The options are as follows:
1.3 aaron 70: .Bl -tag -width file indent
1.1 millert 71: .It Fl c
72: Check that the single input file is sorted.
73: If the file is not sorted,
1.8 aaron 74: .Nm
1.1 millert 75: produces the appropriate error messages and exits with code 1;
76: otherwise,
1.8 aaron 77: .Nm
1.1 millert 78: returns 0.
1.8 aaron 79: .Nm
1.1 millert 80: .Fl c
1.6 pjanzen 81: produces no output, except the error messages on
82: .Em stderr .
1.1 millert 83: .It Fl m
84: Merge only; the input files are assumed to be pre-sorted.
85: .It Fl o Ar output
86: The argument given is the name of an
87: .Ar output
88: file to
89: be used instead of the standard output.
90: This file
91: can be the same as one of the input files.
92: .It Fl T Ar dir
93: Use
94: .Ar dir
1.8 aaron 95: as the directory for temporary files.
96: The default is the contents of the environment variable
1.1 millert 97: .Ev TMPDIR
98: or
99: .Pa /var/tmp
100: if
101: .Ev TMPDIR
102: does not exist.
103: .It Fl u
104: Unique: suppress all but one in each set of lines
105: having equal keys.
106: If used with the
107: .Fl c
108: option,
109: check that there are no lines with duplicate keys.
110: .El
111: .Pp
112: The following options override the default ordering rules.
113: When ordering options appear independent of key field
114: specifications, the requested field ordering rules are
115: applied globally to all sort keys.
116: When attached to a specific key (see
117: .Fl k ) ,
118: the ordering options override
119: all global ordering options for that key.
120: .Bl -tag -width indent
121: .It Fl d
122: Only blank space and alphanumeric characters
123: .\" according
124: .\" to the current setting of LC_CTYPE
125: are used
126: in making comparisons.
127: .It Fl f
128: Considers all lowercase characters that have uppercase
129: equivalents to be the same for purposes of
130: comparison.
131: .It Fl i
132: Ignore all non-printable characters.
133: .It Fl n
134: An initial numeric string, consisting of optional
135: blank space, optional minus sign, and zero or more
136: digits (including decimal point)
137: .\" with
138: .\" optional radix character and thousands
139: .\" separator
140: .\" (as defined in the current locale),
141: is sorted by arithmetic value.
142: (The
143: .Fl n
144: option no longer implies
145: the
146: .Fl b
147: option.)
148: .It Fl r
149: Reverse the sense of comparisons.
150: .It Fl H
1.8 aaron 151: Use a merge sort instead of a radix sort.
152: This options should be used for files larger than 60Mb.
1.1 millert 153: .El
154: .Pp
1.3 aaron 155: The treatment of field separators can be altered using these
1.1 millert 156: options:
157: .Bl -tag -width indent
158: .It Fl b
159: Ignores leading blank space when determining the start
160: and end of a restricted sort key.
161: A
162: .Fl b
163: option specified before the first
164: .Fl k
165: option applies globally to all
166: .Fl k
167: options.
168: Otherwise, the
169: .Fl b
170: option can be
171: attached independently to each
172: .Ar field
173: argument of the
174: .Fl k
175: option (see below).
176: Note that the
177: .Fl b
178: option
179: has no effect unless key fields are specified.
180: .It Fl t Ar char
1.3 aaron 181: .Ar char
1.8 aaron 182: is used as the field separator character.
183: The initial
1.1 millert 184: .Ar char
185: is not considered to be part of a field when determining
1.6 pjanzen 186: key offsets.
1.1 millert 187: Each occurrence of
188: .Ar char
189: is significant (for example,
190: .Dq Ar charchar
191: delimits an empty field).
192: If
193: .Fl t
1.6 pjanzen 194: is not specified, the default field separator is a sequence of
195: blank-space characters, and consecutive blank spaces do
196: .Em not
197: delimit an empty field; further, the initial blank space
198: .Em is
199: considered part of a field when determining key offsets.
1.1 millert 200: .It Fl R Ar char
1.3 aaron 201: .Ar char
1.1 millert 202: is used as the record separator character.
203: This should be used with discretion;
204: .Fl R Ar <alphanumeric>
205: usually produces undesirable results.
1.4 aaron 206: The default record separator is newline.
1.1 millert 207: .It Fl k Ar field1[,field2]
208: Designates the starting position,
209: .Ar field1 ,
1.5 aaron 210: and optional ending position,
1.1 millert 211: .Ar field2 ,
212: of a key field.
213: The
214: .Fl k
215: option replaces the obsolescent options
216: .Cm \(pl Ns Ar pos1
217: and
218: .Fl Ns Ar pos2 .
219: .El
220: .Pp
221: The following operands are available:
222: .Bl -tag -width indent
1.3 aaron 223: .It Ar file
224: The pathname of a file to be sorted, merged, or checked.
225: If no
1.1 millert 226: .Ar file
227: operands are specified, or if
1.3 aaron 228: a
229: .Ar file
230: operand is
1.1 millert 231: .Fl ,
232: the standard input is used.
1.3 aaron 233: .El
1.1 millert 234: .Pp
235: A field is
1.6 pjanzen 236: defined as a maximal sequence of characters other than the
237: field separator and record separator
238: .Pq newline by default .
239: Initial blank spaces are included in the field unless
240: .Fl b
241: has been specified;
242: the first blank space of a sequence of blank spaces acts as the field
243: separator and is included in the field (unless
244: .Fl t
245: is specified).
246: For example, by default all blank spaces at the beginning of a line are
247: considered to be part of the first field.
1.1 millert 248: .Pp
249: Fields are specified
250: by the
251: .Fl k Ar field1[,field2]
1.8 aaron 252: argument.
253: A missing
1.1 millert 254: .Ar field2
255: argument defaults to the end of a line.
256: .Pp
257: The arguments
258: .Ar field1
259: and
260: .Ar field2
261: have the form
262: .Em m.n
1.6 pjanzen 263: .Em (m,n > 0)
264: and can be followed by one or more of the letters
265: .Cm b , d , f , i ,
266: .Cm n ,
267: and
268: .Cm r ,
269: which correspond to the options discussed above.
1.1 millert 270: A
271: .Ar field1
272: position specified by
273: .Em m.n
274: is interpreted as the
275: .Em n Ns th
1.6 pjanzen 276: character from the beginning of the
1.1 millert 277: .Em m Ns th
278: field.
279: A missing
280: .Em \&.n
281: in
282: .Ar field1
283: means
284: .Ql \&.1 ,
285: indicating the first character of the
286: .Em m Ns th
287: field;
1.3 aaron 288: if the
1.1 millert 289: .Fl b
290: option is in effect,
291: .Em n
292: is counted from the first
293: non-blank character in the
294: .Em m Ns th
295: field;
296: .Em m Ns \&.1b
297: refers to the first
298: non-blank character in the
299: .Em m Ns th
300: field.
1.6 pjanzen 301: .No 1\&. Ns Em n
302: refers to the
303: .Em n Ns th
304: character from the beginning of the line;
305: if
306: .Em n
307: is greater than the length of the line, the field is taken to be empty.
1.1 millert 308: .Pp
309: A
310: .Ar field2
311: position specified by
312: .Em m.n
313: is interpreted as
314: the
315: .Em n Ns th
316: character (including separators) of the
317: .Em m Ns th
318: field.
319: A missing
320: .Em \&.n
1.5 aaron 321: indicates the last character of the
1.1 millert 322: .Em m Ns th
323: field;
1.5 aaron 324: .Em m
1.1 millert 325: = \&0
326: designates the end of a line.
327: Thus the option
328: .Fl k Ar v.x,w.y
329: is synonymous with the obsolescent option
330: .Cm \(pl Ns Ar v-\&1.x-\&1
331: .Fl Ns Ar w-\&1.y ;
332: when
333: .Em y
334: is omitted,
335: .Fl k Ar v.x,w
336: is synonymous with
1.5 aaron 337: .Cm \(pl Ns Ar v-\&1.x-\&1
1.1 millert 338: .Fl Ns Ar w+1.0 .
339: The obsolescent
340: .Cm \(pl Ns Ar pos1
341: .Fl Ns Ar pos2
342: option is still supported, except for
1.3 aaron 343: .Fl Ns Ar w\&.0b ,
1.1 millert 344: which has no
345: .Fl k
346: equivalent.
1.8 aaron 347: .Pp
348: The
349: .Nm
350: utility shall exit with one of the following values:
351: .Pp
352: .Bl -tag -width flag -compact
353: .It 0
354: Normal behavior.
355: .It 1
356: On disorder (or non-uniqueness) with the
357: .Fl c
358: option.
359: .It 2
360: An error occurred.
361: .El
1.1 millert 362: .Sh ENVIRONMENT
1.8 aaron 363: The following environment variables affect the execution of
1.3 aaron 364: .Nm sort :
1.1 millert 365: .Bl -tag -width Fl
366: .It Ev TMPDIR
1.3 aaron 367: Path in which to store temporary files.
368: Note that
1.1 millert 369: .Ev TMPDIR
370: may be overridden by the
371: .Fl T
372: option.
373: .Sh FILES
374: .Bl -tag -width Pa -compact
375: .It Pa /var/tmp/sort.*
1.3 aaron 376: default temporary directories
1.1 millert 377: .It Pa Ar output Ns #PID
1.3 aaron 378: temporary name for
1.1 millert 379: .Ar output
380: if
381: .Ar output
1.3 aaron 382: already exists
1.1 millert 383: .El
384: .Sh SEE ALSO
385: .Xr comm 1 ,
1.3 aaron 386: .Xr join 1 ,
387: .Xr uniq 1
1.1 millert 388: .Sh BUGS
389: Lines longer than 65522 characters are discarded and processing continues.
390: To sort files larger than 60Mb, use
1.8 aaron 391: .Nm
1.1 millert 392: .Fl H ;
393: files larger than 704Mb must be sorted in smaller pieces, then merged.
394: To protect data
1.8 aaron 395: .Nm
1.1 millert 396: .Fl o
397: calls link and unlink, and thus fails in protected directories.
398: .Sh HISTORY
399: A
1.8 aaron 400: .Nm
1.1 millert 401: command appeared in
1.9 ! aaron 402: .At v5 .
1.1 millert 403: .Sh NOTES
404: The current sort command uses lexicographic radix sorting, which requires
405: that sort keys be kept in memory (as opposed to previous versions which used quick
1.3 aaron 406: and merge sorts and did not).
1.1 millert 407: Thus performance depends highly on efficient choice of sort keys, and the
408: .Fl b
409: option and the
410: .Ar field2
411: argument of the
412: .Fl k
413: option should be used whenever possible.
414: Similarly,
1.8 aaron 415: .Nm
1.1 millert 416: .Fl k1f
417: is equivalent to
1.8 aaron 418: .Nm
1.1 millert 419: .Fl f
420: and may take twice as long.