Annotation of src/usr.bin/sort/sort.1, Revision 1.36
1.36 ! jmc 1: .\" $OpenBSD: sort.1,v 1.35 2009/12/22 19:47:02 schwarze Exp $
1.1 millert 2: .\"
3: .\" Copyright (c) 1991, 1993
4: .\" The Regents of the University of California. All rights reserved.
5: .\"
6: .\" This code is derived from software contributed to Berkeley by
7: .\" the Institute of Electrical and Electronics Engineers, Inc.
8: .\"
9: .\" Redistribution and use in source and binary forms, with or without
10: .\" modification, are permitted provided that the following conditions
11: .\" are met:
12: .\" 1. Redistributions of source code must retain the above copyright
13: .\" notice, this list of conditions and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
1.20 millert 17: .\" 3. Neither the name of the University nor the names of its contributors
1.1 millert 18: .\" may be used to endorse or promote products derived from this software
19: .\" without specific prior written permission.
20: .\"
21: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
22: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
24: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
25: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
27: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
28: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
29: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
30: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31: .\" SUCH DAMAGE.
32: .\"
33: .\" @(#)sort.1 8.1 (Berkeley) 6/6/93
34: .\"
1.36 ! jmc 35: .Dd $Mdocdate: December 22 2009 $
1.1 millert 36: .Dt SORT 1
37: .Os
38: .Sh NAME
39: .Nm sort
40: .Nd sort or merge text files
41: .Sh SYNOPSIS
42: .Nm sort
1.35 schwarze 43: .Op Fl bCcdfHimnrsuz
1.23 jmc 44: .Sm off
1.24 jmc 45: .Op Fl k\ \& Ar field1 Op , Ar field2
1.23 jmc 46: .Sm on
47: .Op Fl o Ar output
48: .Op Fl R Ar char
49: .Bk -words
1.1 millert 50: .Op Fl T Ar dir
1.23 jmc 51: .Ek
52: .Op Fl t Ar char
1.34 sobrado 53: .Op Ar
1.1 millert 54: .Sh DESCRIPTION
55: The
1.8 aaron 56: .Nm
1.12 aaron 57: utility sorts text files by lines.
1.1 millert 58: Comparisons are based on one or more sort keys extracted
1.8 aaron 59: from each line of input, and are performed lexicographically.
60: By default, if keys are not given,
61: .Nm
1.1 millert 62: regards each input line as a single field.
63: .Pp
1.7 aaron 64: The options are as follows:
1.21 jmc 65: .Bl -tag -width Ds
1.35 schwarze 66: .It Fl C
67: Check that the single input file is sorted.
68: If it is, exit 0; if it's not, exit 1.
69: In either case, produce no output.
1.1 millert 70: .It Fl c
1.35 schwarze 71: Like
72: .Fl C ,
73: but write a message to
74: .Em stderr
75: if the input file is not sorted.
1.1 millert 76: .It Fl m
77: Merge only; the input files are assumed to be pre-sorted.
78: .It Fl o Ar output
79: The argument given is the name of an
80: .Ar output
1.12 aaron 81: file to be used instead of the standard output.
82: This file can be the same as one of the input files.
1.1 millert 83: .It Fl T Ar dir
84: Use
85: .Ar dir
1.8 aaron 86: as the directory for temporary files.
87: The default is the contents of the environment variable
1.1 millert 88: .Ev TMPDIR
89: or
90: .Pa /var/tmp
91: if
92: .Ev TMPDIR
93: does not exist.
94: .It Fl u
1.12 aaron 95: Unique: suppress all but one in each set of lines having equal keys.
1.1 millert 96: If used with the
1.35 schwarze 97: .Fl C
98: or
1.1 millert 99: .Fl c
1.35 schwarze 100: options, also check that there are no lines with duplicate keys.
1.1 millert 101: .El
102: .Pp
103: The following options override the default ordering rules.
104: When ordering options appear independent of key field
105: specifications, the requested field ordering rules are
106: applied globally to all sort keys.
107: When attached to a specific key (see
108: .Fl k ) ,
109: the ordering options override
110: all global ordering options for that key.
111: .Bl -tag -width indent
112: .It Fl d
113: Only blank space and alphanumeric characters
114: .\" according
115: .\" to the current setting of LC_CTYPE
1.12 aaron 116: are used in making comparisons.
1.1 millert 117: .It Fl f
118: Considers all lowercase characters that have uppercase
1.12 aaron 119: equivalents to be the same for purposes of comparison.
1.23 jmc 120: .It Fl H
121: Use a merge sort instead of a radix sort.
122: This option should be used for files larger than 60Mb.
1.1 millert 123: .It Fl i
124: Ignore all non-printable characters.
125: .It Fl n
1.12 aaron 126: An initial numeric string, consisting of optional blank space, optional
127: minus sign, and zero or more digits (including decimal point)
1.1 millert 128: .\" with
129: .\" optional radix character and thousands
130: .\" separator
131: .\" (as defined in the current locale),
132: is sorted by arithmetic value.
133: (The
134: .Fl n
1.12 aaron 135: option no longer implies the
1.1 millert 136: .Fl b
137: option.)
138: .It Fl r
139: Reverse the sense of comparisons.
1.30 millert 140: .It Fl s
1.31 millert 141: Enable stable sort.
142: Uses additional resources (see
143: .Xr sradixsort 3 ) .
1.1 millert 144: .El
145: .Pp
1.12 aaron 146: The treatment of field separators can be altered using these options:
1.1 millert 147: .Bl -tag -width indent
148: .It Fl b
149: Ignores leading blank space when determining the start
150: and end of a restricted sort key.
151: A
152: .Fl b
153: option specified before the first
154: .Fl k
155: option applies globally to all
156: .Fl k
157: options.
158: Otherwise, the
159: .Fl b
1.12 aaron 160: option can be attached independently to each
1.1 millert 161: .Ar field
162: argument of the
163: .Fl k
164: option (see below).
165: Note that the
166: .Fl b
1.12 aaron 167: option has no effect unless key fields are specified.
1.23 jmc 168: .It Xo
169: .Sm off
170: .Fl k\ \& Ar field1 Op , Ar field2
171: .Sm on
172: .Xc
173: Designates the starting position,
174: .Ar field1 ,
175: and optional ending position,
176: .Ar field2 ,
177: of a key field.
1.25 jmc 178: The
179: .Fl k
180: option may be specified multiple times,
181: in which case subsequent keys are compared after earlier keys compare equal.
1.23 jmc 182: The
183: .Fl k
184: option replaces the obsolescent options
185: .Cm \(pl Ns Ar pos1
186: and
187: .Fl Ns Ar pos2 .
188: .It Fl R Ar char
189: .Ar char
190: is used as the record separator character.
191: This should be used with discretion;
192: .Fl R Aq Ar alphanumeric
193: usually produces undesirable results.
194: The default record separator is newline.
1.1 millert 195: .It Fl t Ar char
1.3 aaron 196: .Ar char
1.8 aaron 197: is used as the field separator character.
198: The initial
1.1 millert 199: .Ar char
1.12 aaron 200: is not considered to be part of a field when determining key offsets.
1.1 millert 201: Each occurrence of
202: .Ar char
203: is significant (for example,
204: .Dq Ar charchar
205: delimits an empty field).
206: If
207: .Fl t
1.6 pjanzen 208: is not specified, the default field separator is a sequence of
209: blank-space characters, and consecutive blank spaces do
210: .Em not
211: delimit an empty field; further, the initial blank space
212: .Em is
213: considered part of a field when determining key offsets.
1.22 dlg 214: .It Fl z
215: Uses the nul character as the record separator.
1.1 millert 216: .El
217: .Pp
218: The following operands are available:
219: .Bl -tag -width indent
1.3 aaron 220: .It Ar file
221: The pathname of a file to be sorted, merged, or checked.
222: If no
1.1 millert 223: .Ar file
1.12 aaron 224: operands are specified, or if a
1.3 aaron 225: .Ar file
226: operand is
1.1 millert 227: .Fl ,
228: the standard input is used.
1.3 aaron 229: .El
1.1 millert 230: .Pp
1.12 aaron 231: A field is defined as a maximal sequence of characters other than the
1.6 pjanzen 232: field separator and record separator
233: .Pq newline by default .
234: Initial blank spaces are included in the field unless
235: .Fl b
236: has been specified;
237: the first blank space of a sequence of blank spaces acts as the field
238: separator and is included in the field (unless
239: .Fl t
240: is specified).
241: For example, by default all blank spaces at the beginning of a line are
242: considered to be part of the first field.
1.1 millert 243: .Pp
1.12 aaron 244: Fields are specified by the
1.23 jmc 245: .Sm off
246: .Fl k\ \& Ar field1 Op , Ar field2
247: .Sm on
1.8 aaron 248: argument.
249: A missing
1.1 millert 250: .Ar field2
251: argument defaults to the end of a line.
252: .Pp
253: The arguments
254: .Ar field1
255: and
256: .Ar field2
257: have the form
258: .Em m.n
1.6 pjanzen 259: .Em (m,n > 0)
260: and can be followed by one or more of the letters
261: .Cm b , d , f , i ,
1.10 aaron 262: .Cm n ,
1.6 pjanzen 263: and
264: .Cm r ,
265: which correspond to the options discussed above.
1.1 millert 266: A
267: .Ar field1
268: position specified by
269: .Em m.n
270: is interpreted as the
271: .Em n Ns th
1.6 pjanzen 272: character from the beginning of the
1.1 millert 273: .Em m Ns th
274: field.
275: A missing
276: .Em \&.n
277: in
278: .Ar field1
279: means
280: .Ql \&.1 ,
281: indicating the first character of the
282: .Em m Ns th
1.12 aaron 283: field; if the
1.1 millert 284: .Fl b
285: option is in effect,
286: .Em n
1.12 aaron 287: is counted from the first non-blank character in the
1.1 millert 288: .Em m Ns th
289: field;
290: .Em m Ns \&.1b
1.12 aaron 291: refers to the first non-blank character in the
1.1 millert 292: .Em m Ns th
293: field.
1.6 pjanzen 294: .No 1\&. Ns Em n
295: refers to the
296: .Em n Ns th
297: character from the beginning of the line;
298: if
299: .Em n
300: is greater than the length of the line, the field is taken to be empty.
1.1 millert 301: .Pp
302: A
303: .Ar field2
304: position specified by
305: .Em m.n
1.12 aaron 306: is interpreted as the
1.1 millert 307: .Em n Ns th
308: character (including separators) of the
309: .Em m Ns th
310: field.
311: A missing
312: .Em \&.n
1.5 aaron 313: indicates the last character of the
1.1 millert 314: .Em m Ns th
315: field;
1.5 aaron 316: .Em m
1.1 millert 317: = \&0
318: designates the end of a line.
319: Thus the option
320: .Fl k Ar v.x,w.y
321: is synonymous with the obsolescent option
322: .Cm \(pl Ns Ar v-\&1.x-\&1
323: .Fl Ns Ar w-\&1.y ;
324: when
325: .Em y
326: is omitted,
327: .Fl k Ar v.x,w
328: is synonymous with
1.5 aaron 329: .Cm \(pl Ns Ar v-\&1.x-\&1
1.19 tdeval 330: .Fl Ns Ar w\&.0 .
1.1 millert 331: The obsolescent
332: .Cm \(pl Ns Ar pos1
333: .Fl Ns Ar pos2
334: option is still supported, except for
1.3 aaron 335: .Fl Ns Ar w\&.0b ,
1.1 millert 336: which has no
337: .Fl k
338: equivalent.
1.8 aaron 339: .Pp
340: The
341: .Nm
1.35 schwarze 342: utility exits with one of the following values:
1.8 aaron 343: .Pp
344: .Bl -tag -width flag -compact
345: .It 0
346: Normal behavior.
347: .It 1
1.35 schwarze 348: The input file is not sorted and
349: .Fl C
350: or
1.8 aaron 351: .Fl c
1.35 schwarze 352: was given, or there are duplicate keys and
353: .Fl Cu
354: or
355: .Fl cu
356: was given.
1.8 aaron 357: .It 2
358: An error occurred.
359: .El
1.1 millert 360: .Sh ENVIRONMENT
361: .Bl -tag -width Fl
362: .It Ev TMPDIR
1.3 aaron 363: Path in which to store temporary files.
364: Note that
1.1 millert 365: .Ev TMPDIR
366: may be overridden by the
367: .Fl T
368: option.
1.11 aaron 369: .El
1.1 millert 370: .Sh FILES
371: .Bl -tag -width Pa -compact
372: .It Pa /var/tmp/sort.*
1.3 aaron 373: default temporary directories
1.36 ! jmc 374: .It Pa output Ns #PID
1.3 aaron 375: temporary name for
1.1 millert 376: .Ar output
377: if
378: .Ar output
1.3 aaron 379: already exists
1.1 millert 380: .El
381: .Sh SEE ALSO
382: .Xr comm 1 ,
1.3 aaron 383: .Xr join 1 ,
1.18 fgsch 384: .Xr uniq 1 ,
385: .Xr radixsort 3
1.27 dlg 386: .Sh STANDARDS
387: The
388: .Nm
1.28 jmc 389: utility is compliant with the
1.33 jmc 390: .St -p1003.1-2008
1.27 dlg 391: specification.
392: .Pp
393: The flags
1.32 jmc 394: .Op Fl HRsTz
1.28 jmc 395: are extensions to that specification.
1.1 millert 396: .Sh HISTORY
397: A
1.8 aaron 398: .Nm
1.1 millert 399: command appeared in
1.16 mickey 400: .At v3 .
1.1 millert 401: .Sh NOTES
1.14 ericj 402: .Nm
403: has no limits on input line length (other than imposed by available
404: memory) or any restrictions on bytes allowed within lines.
405: .Pp
406: To protect data
407: .Nm
408: .Fl o
409: calls
410: .Xr link 2
411: and
412: .Xr unlink 2 ,
413: and thus fails on protected directories.
414: .Pp
1.1 millert 415: The current sort command uses lexicographic radix sorting, which requires
1.12 aaron 416: that sort keys be kept in memory (as opposed to previous versions which
417: used quick and merge sorts and did not).
1.1 millert 418: Thus performance depends highly on efficient choice of sort keys, and the
419: .Fl b
420: option and the
421: .Ar field2
422: argument of the
423: .Fl k
424: option should be used whenever possible.
425: Similarly,
1.8 aaron 426: .Nm
1.1 millert 427: .Fl k1f
428: is equivalent to
1.8 aaron 429: .Nm
1.1 millert 430: .Fl f
431: and may take twice as long.
1.12 aaron 432: .Sh BUGS
433: To sort files larger than 60Mb, use
434: .Nm
435: .Fl H ;
436: files larger than 704Mb must be sorted in smaller pieces, then merged.