Annotation of src/usr.bin/sort/sort.1, Revision 1.34
1.34 ! sobrado 1: .\" $OpenBSD: sort.1,v 1.33 2009/02/08 17:15:10 jmc Exp $
1.1 millert 2: .\"
3: .\" Copyright (c) 1991, 1993
4: .\" The Regents of the University of California. All rights reserved.
5: .\"
6: .\" This code is derived from software contributed to Berkeley by
7: .\" the Institute of Electrical and Electronics Engineers, Inc.
8: .\"
9: .\" Redistribution and use in source and binary forms, with or without
10: .\" modification, are permitted provided that the following conditions
11: .\" are met:
12: .\" 1. Redistributions of source code must retain the above copyright
13: .\" notice, this list of conditions and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
1.20 millert 17: .\" 3. Neither the name of the University nor the names of its contributors
1.1 millert 18: .\" may be used to endorse or promote products derived from this software
19: .\" without specific prior written permission.
20: .\"
21: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
22: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
24: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
25: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
27: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
28: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
29: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
30: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31: .\" SUCH DAMAGE.
32: .\"
33: .\" @(#)sort.1 8.1 (Berkeley) 6/6/93
34: .\"
1.34 ! sobrado 35: .Dd $Mdocdate: February 8 2009 $
1.1 millert 36: .Dt SORT 1
37: .Os
38: .Sh NAME
39: .Nm sort
40: .Nd sort or merge text files
41: .Sh SYNOPSIS
42: .Nm sort
1.30 millert 43: .Op Fl bcdfHimnrsuz
1.23 jmc 44: .Sm off
1.24 jmc 45: .Op Fl k\ \& Ar field1 Op , Ar field2
1.23 jmc 46: .Sm on
47: .Op Fl o Ar output
48: .Op Fl R Ar char
49: .Bk -words
1.1 millert 50: .Op Fl T Ar dir
1.23 jmc 51: .Ek
52: .Op Fl t Ar char
1.34 ! sobrado 53: .Op Ar
1.1 millert 54: .Sh DESCRIPTION
55: The
1.8 aaron 56: .Nm
1.12 aaron 57: utility sorts text files by lines.
1.1 millert 58: Comparisons are based on one or more sort keys extracted
1.8 aaron 59: from each line of input, and are performed lexicographically.
60: By default, if keys are not given,
61: .Nm
1.1 millert 62: regards each input line as a single field.
63: .Pp
1.7 aaron 64: The options are as follows:
1.21 jmc 65: .Bl -tag -width Ds
1.1 millert 66: .It Fl c
67: Check that the single input file is sorted.
68: If the file is not sorted,
1.8 aaron 69: .Nm
1.12 aaron 70: produces the appropriate error messages and exits with code 1; otherwise,
1.8 aaron 71: .Nm
1.1 millert 72: returns 0.
1.8 aaron 73: .Nm
1.1 millert 74: .Fl c
1.6 pjanzen 75: produces no output, except the error messages on
76: .Em stderr .
1.1 millert 77: .It Fl m
78: Merge only; the input files are assumed to be pre-sorted.
79: .It Fl o Ar output
80: The argument given is the name of an
81: .Ar output
1.12 aaron 82: file to be used instead of the standard output.
83: This file can be the same as one of the input files.
1.1 millert 84: .It Fl T Ar dir
85: Use
86: .Ar dir
1.8 aaron 87: as the directory for temporary files.
88: The default is the contents of the environment variable
1.1 millert 89: .Ev TMPDIR
90: or
91: .Pa /var/tmp
92: if
93: .Ev TMPDIR
94: does not exist.
95: .It Fl u
1.12 aaron 96: Unique: suppress all but one in each set of lines having equal keys.
1.1 millert 97: If used with the
98: .Fl c
1.26 jmc 99: option, also check that there are no lines with duplicate keys.
1.1 millert 100: .El
101: .Pp
102: The following options override the default ordering rules.
103: When ordering options appear independent of key field
104: specifications, the requested field ordering rules are
105: applied globally to all sort keys.
106: When attached to a specific key (see
107: .Fl k ) ,
108: the ordering options override
109: all global ordering options for that key.
110: .Bl -tag -width indent
111: .It Fl d
112: Only blank space and alphanumeric characters
113: .\" according
114: .\" to the current setting of LC_CTYPE
1.12 aaron 115: are used in making comparisons.
1.1 millert 116: .It Fl f
117: Considers all lowercase characters that have uppercase
1.12 aaron 118: equivalents to be the same for purposes of comparison.
1.23 jmc 119: .It Fl H
120: Use a merge sort instead of a radix sort.
121: This option should be used for files larger than 60Mb.
1.1 millert 122: .It Fl i
123: Ignore all non-printable characters.
124: .It Fl n
1.12 aaron 125: An initial numeric string, consisting of optional blank space, optional
126: minus sign, and zero or more digits (including decimal point)
1.1 millert 127: .\" with
128: .\" optional radix character and thousands
129: .\" separator
130: .\" (as defined in the current locale),
131: is sorted by arithmetic value.
132: (The
133: .Fl n
1.12 aaron 134: option no longer implies the
1.1 millert 135: .Fl b
136: option.)
137: .It Fl r
138: Reverse the sense of comparisons.
1.30 millert 139: .It Fl s
1.31 millert 140: Enable stable sort.
141: Uses additional resources (see
142: .Xr sradixsort 3 ) .
1.1 millert 143: .El
144: .Pp
1.12 aaron 145: The treatment of field separators can be altered using these options:
1.1 millert 146: .Bl -tag -width indent
147: .It Fl b
148: Ignores leading blank space when determining the start
149: and end of a restricted sort key.
150: A
151: .Fl b
152: option specified before the first
153: .Fl k
154: option applies globally to all
155: .Fl k
156: options.
157: Otherwise, the
158: .Fl b
1.12 aaron 159: option can be attached independently to each
1.1 millert 160: .Ar field
161: argument of the
162: .Fl k
163: option (see below).
164: Note that the
165: .Fl b
1.12 aaron 166: option has no effect unless key fields are specified.
1.23 jmc 167: .It Xo
168: .Sm off
169: .Fl k\ \& Ar field1 Op , Ar field2
170: .Sm on
171: .Xc
172: Designates the starting position,
173: .Ar field1 ,
174: and optional ending position,
175: .Ar field2 ,
176: of a key field.
1.25 jmc 177: The
178: .Fl k
179: option may be specified multiple times,
180: in which case subsequent keys are compared after earlier keys compare equal.
1.23 jmc 181: The
182: .Fl k
183: option replaces the obsolescent options
184: .Cm \(pl Ns Ar pos1
185: and
186: .Fl Ns Ar pos2 .
187: .It Fl R Ar char
188: .Ar char
189: is used as the record separator character.
190: This should be used with discretion;
191: .Fl R Aq Ar alphanumeric
192: usually produces undesirable results.
193: The default record separator is newline.
1.1 millert 194: .It Fl t Ar char
1.3 aaron 195: .Ar char
1.8 aaron 196: is used as the field separator character.
197: The initial
1.1 millert 198: .Ar char
1.12 aaron 199: is not considered to be part of a field when determining key offsets.
1.1 millert 200: Each occurrence of
201: .Ar char
202: is significant (for example,
203: .Dq Ar charchar
204: delimits an empty field).
205: If
206: .Fl t
1.6 pjanzen 207: is not specified, the default field separator is a sequence of
208: blank-space characters, and consecutive blank spaces do
209: .Em not
210: delimit an empty field; further, the initial blank space
211: .Em is
212: considered part of a field when determining key offsets.
1.22 dlg 213: .It Fl z
214: Uses the nul character as the record separator.
1.1 millert 215: .El
216: .Pp
217: The following operands are available:
218: .Bl -tag -width indent
1.3 aaron 219: .It Ar file
220: The pathname of a file to be sorted, merged, or checked.
221: If no
1.1 millert 222: .Ar file
1.12 aaron 223: operands are specified, or if a
1.3 aaron 224: .Ar file
225: operand is
1.1 millert 226: .Fl ,
227: the standard input is used.
1.3 aaron 228: .El
1.1 millert 229: .Pp
1.12 aaron 230: A field is defined as a maximal sequence of characters other than the
1.6 pjanzen 231: field separator and record separator
232: .Pq newline by default .
233: Initial blank spaces are included in the field unless
234: .Fl b
235: has been specified;
236: the first blank space of a sequence of blank spaces acts as the field
237: separator and is included in the field (unless
238: .Fl t
239: is specified).
240: For example, by default all blank spaces at the beginning of a line are
241: considered to be part of the first field.
1.1 millert 242: .Pp
1.12 aaron 243: Fields are specified by the
1.23 jmc 244: .Sm off
245: .Fl k\ \& Ar field1 Op , Ar field2
246: .Sm on
1.8 aaron 247: argument.
248: A missing
1.1 millert 249: .Ar field2
250: argument defaults to the end of a line.
251: .Pp
252: The arguments
253: .Ar field1
254: and
255: .Ar field2
256: have the form
257: .Em m.n
1.6 pjanzen 258: .Em (m,n > 0)
259: and can be followed by one or more of the letters
260: .Cm b , d , f , i ,
1.10 aaron 261: .Cm n ,
1.6 pjanzen 262: and
263: .Cm r ,
264: which correspond to the options discussed above.
1.1 millert 265: A
266: .Ar field1
267: position specified by
268: .Em m.n
269: is interpreted as the
270: .Em n Ns th
1.6 pjanzen 271: character from the beginning of the
1.1 millert 272: .Em m Ns th
273: field.
274: A missing
275: .Em \&.n
276: in
277: .Ar field1
278: means
279: .Ql \&.1 ,
280: indicating the first character of the
281: .Em m Ns th
1.12 aaron 282: field; if the
1.1 millert 283: .Fl b
284: option is in effect,
285: .Em n
1.12 aaron 286: is counted from the first non-blank character in the
1.1 millert 287: .Em m Ns th
288: field;
289: .Em m Ns \&.1b
1.12 aaron 290: refers to the first non-blank character in the
1.1 millert 291: .Em m Ns th
292: field.
1.6 pjanzen 293: .No 1\&. Ns Em n
294: refers to the
295: .Em n Ns th
296: character from the beginning of the line;
297: if
298: .Em n
299: is greater than the length of the line, the field is taken to be empty.
1.1 millert 300: .Pp
301: A
302: .Ar field2
303: position specified by
304: .Em m.n
1.12 aaron 305: is interpreted as the
1.1 millert 306: .Em n Ns th
307: character (including separators) of the
308: .Em m Ns th
309: field.
310: A missing
311: .Em \&.n
1.5 aaron 312: indicates the last character of the
1.1 millert 313: .Em m Ns th
314: field;
1.5 aaron 315: .Em m
1.1 millert 316: = \&0
317: designates the end of a line.
318: Thus the option
319: .Fl k Ar v.x,w.y
320: is synonymous with the obsolescent option
321: .Cm \(pl Ns Ar v-\&1.x-\&1
322: .Fl Ns Ar w-\&1.y ;
323: when
324: .Em y
325: is omitted,
326: .Fl k Ar v.x,w
327: is synonymous with
1.5 aaron 328: .Cm \(pl Ns Ar v-\&1.x-\&1
1.19 tdeval 329: .Fl Ns Ar w\&.0 .
1.1 millert 330: The obsolescent
331: .Cm \(pl Ns Ar pos1
332: .Fl Ns Ar pos2
333: option is still supported, except for
1.3 aaron 334: .Fl Ns Ar w\&.0b ,
1.1 millert 335: which has no
336: .Fl k
337: equivalent.
1.8 aaron 338: .Pp
339: The
340: .Nm
341: utility shall exit with one of the following values:
342: .Pp
343: .Bl -tag -width flag -compact
344: .It 0
345: Normal behavior.
346: .It 1
347: On disorder (or non-uniqueness) with the
348: .Fl c
349: option.
350: .It 2
351: An error occurred.
352: .El
1.1 millert 353: .Sh ENVIRONMENT
354: .Bl -tag -width Fl
355: .It Ev TMPDIR
1.3 aaron 356: Path in which to store temporary files.
357: Note that
1.1 millert 358: .Ev TMPDIR
359: may be overridden by the
360: .Fl T
361: option.
1.11 aaron 362: .El
1.1 millert 363: .Sh FILES
364: .Bl -tag -width Pa -compact
365: .It Pa /var/tmp/sort.*
1.3 aaron 366: default temporary directories
1.1 millert 367: .It Pa Ar output Ns #PID
1.3 aaron 368: temporary name for
1.1 millert 369: .Ar output
370: if
371: .Ar output
1.3 aaron 372: already exists
1.1 millert 373: .El
374: .Sh SEE ALSO
375: .Xr comm 1 ,
1.3 aaron 376: .Xr join 1 ,
1.18 fgsch 377: .Xr uniq 1 ,
378: .Xr radixsort 3
1.27 dlg 379: .Sh STANDARDS
380: The
381: .Nm
1.28 jmc 382: utility is compliant with the
1.33 jmc 383: .St -p1003.1-2008
1.27 dlg 384: specification.
385: .Pp
386: The flags
1.32 jmc 387: .Op Fl HRsTz
1.28 jmc 388: are extensions to that specification.
1.1 millert 389: .Sh HISTORY
390: A
1.8 aaron 391: .Nm
1.1 millert 392: command appeared in
1.16 mickey 393: .At v3 .
1.1 millert 394: .Sh NOTES
1.14 ericj 395: .Nm
396: has no limits on input line length (other than imposed by available
397: memory) or any restrictions on bytes allowed within lines.
398: .Pp
399: To protect data
400: .Nm
401: .Fl o
402: calls
403: .Xr link 2
404: and
405: .Xr unlink 2 ,
406: and thus fails on protected directories.
407: .Pp
1.1 millert 408: The current sort command uses lexicographic radix sorting, which requires
1.12 aaron 409: that sort keys be kept in memory (as opposed to previous versions which
410: used quick and merge sorts and did not).
1.1 millert 411: Thus performance depends highly on efficient choice of sort keys, and the
412: .Fl b
413: option and the
414: .Ar field2
415: argument of the
416: .Fl k
417: option should be used whenever possible.
418: Similarly,
1.8 aaron 419: .Nm
1.1 millert 420: .Fl k1f
421: is equivalent to
1.8 aaron 422: .Nm
1.1 millert 423: .Fl f
424: and may take twice as long.
1.12 aaron 425: .Sh BUGS
426: To sort files larger than 60Mb, use
427: .Nm
428: .Fl H ;
429: files larger than 704Mb must be sorted in smaller pieces, then merged.