Annotation of src/usr.bin/sort/sort.1, Revision 1.39
1.39 ! jmc 1: .\" $OpenBSD: sort.1,v 1.38 2010/06/28 15:28:52 jmc Exp $
1.1 millert 2: .\"
3: .\" Copyright (c) 1991, 1993
4: .\" The Regents of the University of California. All rights reserved.
5: .\"
6: .\" This code is derived from software contributed to Berkeley by
7: .\" the Institute of Electrical and Electronics Engineers, Inc.
8: .\"
9: .\" Redistribution and use in source and binary forms, with or without
10: .\" modification, are permitted provided that the following conditions
11: .\" are met:
12: .\" 1. Redistributions of source code must retain the above copyright
13: .\" notice, this list of conditions and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
1.20 millert 17: .\" 3. Neither the name of the University nor the names of its contributors
1.1 millert 18: .\" may be used to endorse or promote products derived from this software
19: .\" without specific prior written permission.
20: .\"
21: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
22: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
24: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
25: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
27: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
28: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
29: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
30: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31: .\" SUCH DAMAGE.
32: .\"
33: .\" @(#)sort.1 8.1 (Berkeley) 6/6/93
34: .\"
1.39 ! jmc 35: .Dd $Mdocdate: June 28 2010 $
1.1 millert 36: .Dt SORT 1
37: .Os
38: .Sh NAME
39: .Nm sort
1.37 jmc 40: .Nd sort, merge, or sequence check text files
1.1 millert 41: .Sh SYNOPSIS
42: .Nm sort
1.35 schwarze 43: .Op Fl bCcdfHimnrsuz
1.23 jmc 44: .Sm off
1.24 jmc 45: .Op Fl k\ \& Ar field1 Op , Ar field2
1.23 jmc 46: .Sm on
47: .Op Fl o Ar output
48: .Op Fl R Ar char
49: .Bk -words
1.1 millert 50: .Op Fl T Ar dir
1.23 jmc 51: .Ek
52: .Op Fl t Ar char
1.34 sobrado 53: .Op Ar
1.1 millert 54: .Sh DESCRIPTION
55: The
1.8 aaron 56: .Nm
1.37 jmc 57: utility sorts text files by lines,
58: operating in one of three modes: sort, merge, or check.
59: In sort mode, the specified files are combined and sorted
60: by line.
61: Merge mode is the same as sort mode except that the input
62: files are assumed to be pre-sorted.
63: In check mode, a single input file is checked to ensure that
64: it is correctly sorted.
65: .Pp
1.1 millert 66: Comparisons are based on one or more sort keys extracted
1.8 aaron 67: from each line of input, and are performed lexicographically.
68: By default, if keys are not given,
69: .Nm
1.1 millert 70: regards each input line as a single field.
71: .Pp
1.7 aaron 72: The options are as follows:
1.21 jmc 73: .Bl -tag -width Ds
1.35 schwarze 74: .It Fl C
75: Check that the single input file is sorted.
76: If it is, exit 0; if it's not, exit 1.
77: In either case, produce no output.
1.1 millert 78: .It Fl c
1.35 schwarze 79: Like
80: .Fl C ,
1.37 jmc 81: but additionally write a message to
1.35 schwarze 82: .Em stderr
83: if the input file is not sorted.
1.1 millert 84: .It Fl m
85: Merge only; the input files are assumed to be pre-sorted.
1.37 jmc 86: This option is overridden by the
87: .Fl C
88: or
89: .Fl c
90: options,
91: if they are also present.
1.1 millert 92: .It Fl o Ar output
93: The argument given is the name of an
94: .Ar output
1.12 aaron 95: file to be used instead of the standard output.
96: This file can be the same as one of the input files.
1.1 millert 97: .It Fl T Ar dir
98: Use
99: .Ar dir
1.8 aaron 100: as the directory for temporary files.
101: The default is the contents of the environment variable
1.1 millert 102: .Ev TMPDIR
103: or
104: .Pa /var/tmp
105: if
106: .Ev TMPDIR
107: does not exist.
108: .It Fl u
1.12 aaron 109: Unique: suppress all but one in each set of lines having equal keys.
1.1 millert 110: If used with the
1.35 schwarze 111: .Fl C
112: or
1.1 millert 113: .Fl c
1.35 schwarze 114: options, also check that there are no lines with duplicate keys.
1.1 millert 115: .El
116: .Pp
1.38 jmc 117: The following options override the default ordering rules globally:
118: .Bl -tag -width indent
119: .It Fl H
120: Use a merge sort instead of a radix sort.
121: This option should be used for files larger than 60Mb.
122: .It Fl s
123: Enable stable sort.
124: Uses additional resources (see
125: .Xr sradixsort 3 ) .
126: .El
127: .Pp
1.1 millert 128: The following options override the default ordering rules.
1.37 jmc 129: If ordering options appear before the first
130: .Fl k
131: option, they apply globally to all sort keys.
1.1 millert 132: When attached to a specific key (see
133: .Fl k ) ,
134: the ordering options override
135: all global ordering options for that key.
1.37 jmc 136: Note that the ordering options intended to apply globally should not
137: appear after
138: .Fl k
139: or results may be unexpected.
1.1 millert 140: .Bl -tag -width indent
141: .It Fl d
142: Only blank space and alphanumeric characters
143: .\" according
144: .\" to the current setting of LC_CTYPE
1.12 aaron 145: are used in making comparisons.
1.1 millert 146: .It Fl f
147: Considers all lowercase characters that have uppercase
1.12 aaron 148: equivalents to be the same for purposes of comparison.
1.1 millert 149: .It Fl i
150: Ignore all non-printable characters.
151: .It Fl n
1.12 aaron 152: An initial numeric string, consisting of optional blank space, optional
153: minus sign, and zero or more digits (including decimal point)
1.1 millert 154: .\" with
155: .\" optional radix character and thousands
156: .\" separator
157: .\" (as defined in the current locale),
158: is sorted by arithmetic value.
159: (The
160: .Fl n
1.12 aaron 161: option no longer implies the
1.1 millert 162: .Fl b
163: option.)
164: .It Fl r
165: Reverse the sense of comparisons.
166: .El
167: .Pp
1.12 aaron 168: The treatment of field separators can be altered using these options:
1.1 millert 169: .Bl -tag -width indent
170: .It Fl b
171: Ignores leading blank space when determining the start
172: and end of a restricted sort key.
173: A
174: .Fl b
175: option specified before the first
176: .Fl k
177: option applies globally to all
178: .Fl k
179: options.
180: Otherwise, the
181: .Fl b
1.12 aaron 182: option can be attached independently to each
1.1 millert 183: .Ar field
184: argument of the
185: .Fl k
186: option (see below).
1.37 jmc 187: Note that
1.1 millert 188: .Fl b
1.37 jmc 189: should not appear after
190: .Fl k ,
191: and that it has no effect unless key fields are specified.
1.23 jmc 192: .It Fl R Ar char
193: .Ar char
194: is used as the record separator character.
195: This should be used with discretion;
196: .Fl R Aq Ar alphanumeric
197: usually produces undesirable results.
198: The default record separator is newline.
1.1 millert 199: .It Fl t Ar char
1.3 aaron 200: .Ar char
1.8 aaron 201: is used as the field separator character.
202: The initial
1.1 millert 203: .Ar char
1.12 aaron 204: is not considered to be part of a field when determining key offsets.
1.1 millert 205: Each occurrence of
206: .Ar char
207: is significant (for example,
208: .Dq Ar charchar
209: delimits an empty field).
210: If
211: .Fl t
1.6 pjanzen 212: is not specified, the default field separator is a sequence of
213: blank-space characters, and consecutive blank spaces do
214: .Em not
215: delimit an empty field; further, the initial blank space
216: .Em is
217: considered part of a field when determining key offsets.
1.22 dlg 218: .It Fl z
219: Uses the nul character as the record separator.
1.37 jmc 220: .El
221: .Pp
222: Sort keys are specified with:
223: .Bl -tag -width indent
224: .It Xo
225: .Sm off
226: .Fl k\ \& Ar field1 Op , Ar field2
227: .Sm on
228: .Xc
229: Designates the starting position,
230: .Ar field1 ,
231: and optional ending position,
232: .Ar field2 ,
233: of a key field.
234: The
235: .Fl k
236: option may be specified multiple times,
237: in which case subsequent keys are compared after earlier keys compare equal.
238: The
239: .Fl k
240: option replaces the obsolescent options
241: .Cm \(pl Ns Ar pos1
242: and
243: .Fl Ns Ar pos2 .
1.1 millert 244: .El
245: .Pp
246: The following operands are available:
247: .Bl -tag -width indent
1.3 aaron 248: .It Ar file
249: The pathname of a file to be sorted, merged, or checked.
250: If no
1.1 millert 251: .Ar file
1.12 aaron 252: operands are specified, or if a
1.3 aaron 253: .Ar file
254: operand is
1.1 millert 255: .Fl ,
256: the standard input is used.
1.3 aaron 257: .El
1.1 millert 258: .Pp
1.12 aaron 259: A field is defined as a maximal sequence of characters other than the
1.6 pjanzen 260: field separator and record separator
261: .Pq newline by default .
262: Initial blank spaces are included in the field unless
263: .Fl b
264: has been specified;
265: the first blank space of a sequence of blank spaces acts as the field
266: separator and is included in the field (unless
267: .Fl t
268: is specified).
269: For example, by default all blank spaces at the beginning of a line are
270: considered to be part of the first field.
1.1 millert 271: .Pp
1.12 aaron 272: Fields are specified by the
1.23 jmc 273: .Sm off
274: .Fl k\ \& Ar field1 Op , Ar field2
275: .Sm on
1.8 aaron 276: argument.
277: A missing
1.1 millert 278: .Ar field2
279: argument defaults to the end of a line.
280: .Pp
281: The arguments
282: .Ar field1
283: and
284: .Ar field2
285: have the form
286: .Em m.n
1.6 pjanzen 287: .Em (m,n > 0)
288: and can be followed by one or more of the letters
289: .Cm b , d , f , i ,
1.10 aaron 290: .Cm n ,
1.6 pjanzen 291: and
292: .Cm r ,
293: which correspond to the options discussed above.
1.1 millert 294: A
295: .Ar field1
296: position specified by
297: .Em m.n
298: is interpreted as the
299: .Em n Ns th
1.6 pjanzen 300: character from the beginning of the
1.1 millert 301: .Em m Ns th
302: field.
303: A missing
304: .Em \&.n
305: in
306: .Ar field1
307: means
308: .Ql \&.1 ,
309: indicating the first character of the
310: .Em m Ns th
1.12 aaron 311: field; if the
1.1 millert 312: .Fl b
313: option is in effect,
314: .Em n
1.12 aaron 315: is counted from the first non-blank character in the
1.1 millert 316: .Em m Ns th
317: field;
318: .Em m Ns \&.1b
1.12 aaron 319: refers to the first non-blank character in the
1.1 millert 320: .Em m Ns th
321: field.
1.6 pjanzen 322: .No 1\&. Ns Em n
323: refers to the
324: .Em n Ns th
325: character from the beginning of the line;
326: if
327: .Em n
328: is greater than the length of the line, the field is taken to be empty.
1.1 millert 329: .Pp
330: A
331: .Ar field2
332: position specified by
333: .Em m.n
1.12 aaron 334: is interpreted as the
1.1 millert 335: .Em n Ns th
336: character (including separators) of the
337: .Em m Ns th
338: field.
339: A missing
340: .Em \&.n
1.5 aaron 341: indicates the last character of the
1.1 millert 342: .Em m Ns th
343: field;
1.5 aaron 344: .Em m
1.1 millert 345: = \&0
346: designates the end of a line.
347: Thus the option
348: .Fl k Ar v.x,w.y
349: is synonymous with the obsolescent option
350: .Cm \(pl Ns Ar v-\&1.x-\&1
351: .Fl Ns Ar w-\&1.y ;
352: when
353: .Em y
354: is omitted,
355: .Fl k Ar v.x,w
356: is synonymous with
1.5 aaron 357: .Cm \(pl Ns Ar v-\&1.x-\&1
1.19 tdeval 358: .Fl Ns Ar w\&.0 .
1.1 millert 359: The obsolescent
360: .Cm \(pl Ns Ar pos1
361: .Fl Ns Ar pos2
362: option is still supported, except for
1.3 aaron 363: .Fl Ns Ar w\&.0b ,
1.1 millert 364: which has no
365: .Fl k
366: equivalent.
367: .Sh ENVIRONMENT
368: .Bl -tag -width Fl
369: .It Ev TMPDIR
1.3 aaron 370: Path in which to store temporary files.
371: Note that
1.1 millert 372: .Ev TMPDIR
373: may be overridden by the
374: .Fl T
375: option.
1.11 aaron 376: .El
1.1 millert 377: .Sh FILES
378: .Bl -tag -width Pa -compact
379: .It Pa /var/tmp/sort.*
1.3 aaron 380: default temporary directories
1.36 jmc 381: .It Pa output Ns #PID
1.3 aaron 382: temporary name for
1.1 millert 383: .Ar output
384: if
385: .Ar output
1.3 aaron 386: already exists
1.39 ! jmc 387: .El
! 388: .Sh EXIT STATUS
! 389: The
! 390: .Nm
! 391: utility exits with one of the following values:
! 392: .Pp
! 393: .Bl -tag -width Ds -offset indent -compact
! 394: .It 0
! 395: Normal behavior.
! 396: .It 1
! 397: The input file is not sorted and
! 398: .Fl C
! 399: or
! 400: .Fl c
! 401: was given, or there are duplicate keys and
! 402: .Fl Cu
! 403: or
! 404: .Fl cu
! 405: was given.
! 406: .It 2
! 407: An error occurred.
1.1 millert 408: .El
409: .Sh SEE ALSO
410: .Xr comm 1 ,
1.3 aaron 411: .Xr join 1 ,
1.18 fgsch 412: .Xr uniq 1 ,
413: .Xr radixsort 3
1.27 dlg 414: .Sh STANDARDS
415: The
416: .Nm
1.28 jmc 417: utility is compliant with the
1.33 jmc 418: .St -p1003.1-2008
1.27 dlg 419: specification.
420: .Pp
421: The flags
1.32 jmc 422: .Op Fl HRsTz
1.28 jmc 423: are extensions to that specification.
1.1 millert 424: .Sh HISTORY
425: A
1.8 aaron 426: .Nm
1.1 millert 427: command appeared in
1.16 mickey 428: .At v3 .
1.1 millert 429: .Sh NOTES
1.14 ericj 430: .Nm
431: has no limits on input line length (other than imposed by available
432: memory) or any restrictions on bytes allowed within lines.
433: .Pp
434: To protect data
435: .Nm
436: .Fl o
437: calls
438: .Xr link 2
439: and
440: .Xr unlink 2 ,
441: and thus fails on protected directories.
442: .Pp
1.1 millert 443: The current sort command uses lexicographic radix sorting, which requires
1.12 aaron 444: that sort keys be kept in memory (as opposed to previous versions which
445: used quick and merge sorts and did not).
1.1 millert 446: Thus performance depends highly on efficient choice of sort keys, and the
447: .Fl b
448: option and the
449: .Ar field2
450: argument of the
451: .Fl k
452: option should be used whenever possible.
453: Similarly,
1.8 aaron 454: .Nm
1.1 millert 455: .Fl k1f
456: is equivalent to
1.8 aaron 457: .Nm
1.1 millert 458: .Fl f
459: and may take twice as long.
1.12 aaron 460: .Sh BUGS
461: To sort files larger than 60Mb, use
462: .Nm
463: .Fl H ;
464: files larger than 704Mb must be sorted in smaller pieces, then merged.