Annotation of src/usr.bin/sort/sort.1, Revision 1.23
1.22 dlg 1: .\" $OpenBSD: sort.1,v 1.21 2003/07/14 12:56:07 jmc Exp $
1.1 millert 2: .\"
3: .\" Copyright (c) 1991, 1993
4: .\" The Regents of the University of California. All rights reserved.
5: .\"
6: .\" This code is derived from software contributed to Berkeley by
7: .\" the Institute of Electrical and Electronics Engineers, Inc.
8: .\"
9: .\" Redistribution and use in source and binary forms, with or without
10: .\" modification, are permitted provided that the following conditions
11: .\" are met:
12: .\" 1. Redistributions of source code must retain the above copyright
13: .\" notice, this list of conditions and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
1.20 millert 17: .\" 3. Neither the name of the University nor the names of its contributors
1.1 millert 18: .\" may be used to endorse or promote products derived from this software
19: .\" without specific prior written permission.
20: .\"
21: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
22: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
24: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
25: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
27: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
28: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
29: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
30: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31: .\" SUCH DAMAGE.
32: .\"
33: .\" @(#)sort.1 8.1 (Berkeley) 6/6/93
34: .\"
35: .Dd June 6, 1993
36: .Dt SORT 1
37: .Os
38: .Sh NAME
39: .Nm sort
40: .Nd sort or merge text files
41: .Sh SYNOPSIS
42: .Nm sort
1.23 ! jmc 43: .Op Fl bcdfHimnruz
1.1 millert 44: .Oo
1.23 ! jmc 45: .Sm off
! 46: .Fl k
! 47: .Ar field1 Op , Ar field2
! 48: .Ar ...
! 49: .Sm on
1.1 millert 50: .Oc
1.23 ! jmc 51: .Op Fl o Ar output
! 52: .Op Fl R Ar char
! 53: .Bk -words
1.1 millert 54: .Op Fl T Ar dir
1.23 ! jmc 55: .Ek
! 56: .Op Fl t Ar char
! 57: .Op Ar file ...
1.1 millert 58: .Sh DESCRIPTION
59: The
1.8 aaron 60: .Nm
1.12 aaron 61: utility sorts text files by lines.
1.1 millert 62: Comparisons are based on one or more sort keys extracted
1.8 aaron 63: from each line of input, and are performed lexicographically.
64: By default, if keys are not given,
65: .Nm
1.1 millert 66: regards each input line as a single field.
67: .Pp
1.7 aaron 68: The options are as follows:
1.21 jmc 69: .Bl -tag -width Ds
1.1 millert 70: .It Fl c
71: Check that the single input file is sorted.
72: If the file is not sorted,
1.8 aaron 73: .Nm
1.12 aaron 74: produces the appropriate error messages and exits with code 1; otherwise,
1.8 aaron 75: .Nm
1.1 millert 76: returns 0.
1.8 aaron 77: .Nm
1.1 millert 78: .Fl c
1.6 pjanzen 79: produces no output, except the error messages on
80: .Em stderr .
1.1 millert 81: .It Fl m
82: Merge only; the input files are assumed to be pre-sorted.
83: .It Fl o Ar output
84: The argument given is the name of an
85: .Ar output
1.12 aaron 86: file to be used instead of the standard output.
87: This file can be the same as one of the input files.
1.1 millert 88: .It Fl T Ar dir
89: Use
90: .Ar dir
1.8 aaron 91: as the directory for temporary files.
92: The default is the contents of the environment variable
1.1 millert 93: .Ev TMPDIR
94: or
95: .Pa /var/tmp
96: if
97: .Ev TMPDIR
98: does not exist.
99: .It Fl u
1.12 aaron 100: Unique: suppress all but one in each set of lines having equal keys.
1.1 millert 101: If used with the
102: .Fl c
1.12 aaron 103: option, check that there are no lines with duplicate keys.
1.1 millert 104: .El
105: .Pp
106: The following options override the default ordering rules.
107: When ordering options appear independent of key field
108: specifications, the requested field ordering rules are
109: applied globally to all sort keys.
110: When attached to a specific key (see
111: .Fl k ) ,
112: the ordering options override
113: all global ordering options for that key.
114: .Bl -tag -width indent
115: .It Fl d
116: Only blank space and alphanumeric characters
117: .\" according
118: .\" to the current setting of LC_CTYPE
1.12 aaron 119: are used in making comparisons.
1.1 millert 120: .It Fl f
121: Considers all lowercase characters that have uppercase
1.12 aaron 122: equivalents to be the same for purposes of comparison.
1.23 ! jmc 123: .It Fl H
! 124: Use a merge sort instead of a radix sort.
! 125: This option should be used for files larger than 60Mb.
1.1 millert 126: .It Fl i
127: Ignore all non-printable characters.
128: .It Fl n
1.12 aaron 129: An initial numeric string, consisting of optional blank space, optional
130: minus sign, and zero or more digits (including decimal point)
1.1 millert 131: .\" with
132: .\" optional radix character and thousands
133: .\" separator
134: .\" (as defined in the current locale),
135: is sorted by arithmetic value.
136: (The
137: .Fl n
1.12 aaron 138: option no longer implies the
1.1 millert 139: .Fl b
140: option.)
141: .It Fl r
142: Reverse the sense of comparisons.
143: .El
144: .Pp
1.12 aaron 145: The treatment of field separators can be altered using these options:
1.1 millert 146: .Bl -tag -width indent
147: .It Fl b
148: Ignores leading blank space when determining the start
149: and end of a restricted sort key.
150: A
151: .Fl b
152: option specified before the first
153: .Fl k
154: option applies globally to all
155: .Fl k
156: options.
157: Otherwise, the
158: .Fl b
1.12 aaron 159: option can be attached independently to each
1.1 millert 160: .Ar field
161: argument of the
162: .Fl k
163: option (see below).
164: Note that the
165: .Fl b
1.12 aaron 166: option has no effect unless key fields are specified.
1.23 ! jmc 167: .It Xo
! 168: .Sm off
! 169: .Fl k\ \& Ar field1 Op , Ar field2
! 170: .Sm on
! 171: .Xc
! 172: Designates the starting position,
! 173: .Ar field1 ,
! 174: and optional ending position,
! 175: .Ar field2 ,
! 176: of a key field.
! 177: The
! 178: .Fl k
! 179: option replaces the obsolescent options
! 180: .Cm \(pl Ns Ar pos1
! 181: and
! 182: .Fl Ns Ar pos2 .
! 183: .It Fl R Ar char
! 184: .Ar char
! 185: is used as the record separator character.
! 186: This should be used with discretion;
! 187: .Fl R Aq Ar alphanumeric
! 188: usually produces undesirable results.
! 189: The default record separator is newline.
1.1 millert 190: .It Fl t Ar char
1.3 aaron 191: .Ar char
1.8 aaron 192: is used as the field separator character.
193: The initial
1.1 millert 194: .Ar char
1.12 aaron 195: is not considered to be part of a field when determining key offsets.
1.1 millert 196: Each occurrence of
197: .Ar char
198: is significant (for example,
199: .Dq Ar charchar
200: delimits an empty field).
201: If
202: .Fl t
1.6 pjanzen 203: is not specified, the default field separator is a sequence of
204: blank-space characters, and consecutive blank spaces do
205: .Em not
206: delimit an empty field; further, the initial blank space
207: .Em is
208: considered part of a field when determining key offsets.
1.22 dlg 209: .It Fl z
210: Uses the nul character as the record separator.
1.1 millert 211: .El
212: .Pp
213: The following operands are available:
214: .Bl -tag -width indent
1.3 aaron 215: .It Ar file
216: The pathname of a file to be sorted, merged, or checked.
217: If no
1.1 millert 218: .Ar file
1.12 aaron 219: operands are specified, or if a
1.3 aaron 220: .Ar file
221: operand is
1.1 millert 222: .Fl ,
223: the standard input is used.
1.3 aaron 224: .El
1.1 millert 225: .Pp
1.12 aaron 226: A field is defined as a maximal sequence of characters other than the
1.6 pjanzen 227: field separator and record separator
228: .Pq newline by default .
229: Initial blank spaces are included in the field unless
230: .Fl b
231: has been specified;
232: the first blank space of a sequence of blank spaces acts as the field
233: separator and is included in the field (unless
234: .Fl t
235: is specified).
236: For example, by default all blank spaces at the beginning of a line are
237: considered to be part of the first field.
1.1 millert 238: .Pp
1.12 aaron 239: Fields are specified by the
1.23 ! jmc 240: .Sm off
! 241: .Fl k\ \& Ar field1 Op , Ar field2
! 242: .Sm on
1.8 aaron 243: argument.
244: A missing
1.1 millert 245: .Ar field2
246: argument defaults to the end of a line.
247: .Pp
248: The arguments
249: .Ar field1
250: and
251: .Ar field2
252: have the form
253: .Em m.n
1.6 pjanzen 254: .Em (m,n > 0)
255: and can be followed by one or more of the letters
256: .Cm b , d , f , i ,
1.10 aaron 257: .Cm n ,
1.6 pjanzen 258: and
259: .Cm r ,
260: which correspond to the options discussed above.
1.1 millert 261: A
262: .Ar field1
263: position specified by
264: .Em m.n
265: is interpreted as the
266: .Em n Ns th
1.6 pjanzen 267: character from the beginning of the
1.1 millert 268: .Em m Ns th
269: field.
270: A missing
271: .Em \&.n
272: in
273: .Ar field1
274: means
275: .Ql \&.1 ,
276: indicating the first character of the
277: .Em m Ns th
1.12 aaron 278: field; if the
1.1 millert 279: .Fl b
280: option is in effect,
281: .Em n
1.12 aaron 282: is counted from the first non-blank character in the
1.1 millert 283: .Em m Ns th
284: field;
285: .Em m Ns \&.1b
1.12 aaron 286: refers to the first non-blank character in the
1.1 millert 287: .Em m Ns th
288: field.
1.6 pjanzen 289: .No 1\&. Ns Em n
290: refers to the
291: .Em n Ns th
292: character from the beginning of the line;
293: if
294: .Em n
295: is greater than the length of the line, the field is taken to be empty.
1.1 millert 296: .Pp
297: A
298: .Ar field2
299: position specified by
300: .Em m.n
1.12 aaron 301: is interpreted as the
1.1 millert 302: .Em n Ns th
303: character (including separators) of the
304: .Em m Ns th
305: field.
306: A missing
307: .Em \&.n
1.5 aaron 308: indicates the last character of the
1.1 millert 309: .Em m Ns th
310: field;
1.5 aaron 311: .Em m
1.1 millert 312: = \&0
313: designates the end of a line.
314: Thus the option
315: .Fl k Ar v.x,w.y
316: is synonymous with the obsolescent option
317: .Cm \(pl Ns Ar v-\&1.x-\&1
318: .Fl Ns Ar w-\&1.y ;
319: when
320: .Em y
321: is omitted,
322: .Fl k Ar v.x,w
323: is synonymous with
1.5 aaron 324: .Cm \(pl Ns Ar v-\&1.x-\&1
1.19 tdeval 325: .Fl Ns Ar w\&.0 .
1.1 millert 326: The obsolescent
327: .Cm \(pl Ns Ar pos1
328: .Fl Ns Ar pos2
329: option is still supported, except for
1.3 aaron 330: .Fl Ns Ar w\&.0b ,
1.1 millert 331: which has no
332: .Fl k
333: equivalent.
1.8 aaron 334: .Pp
335: The
336: .Nm
337: utility shall exit with one of the following values:
338: .Pp
339: .Bl -tag -width flag -compact
340: .It 0
341: Normal behavior.
342: .It 1
343: On disorder (or non-uniqueness) with the
344: .Fl c
345: option.
346: .It 2
347: An error occurred.
348: .El
1.1 millert 349: .Sh ENVIRONMENT
350: .Bl -tag -width Fl
351: .It Ev TMPDIR
1.3 aaron 352: Path in which to store temporary files.
353: Note that
1.1 millert 354: .Ev TMPDIR
355: may be overridden by the
356: .Fl T
357: option.
1.11 aaron 358: .El
1.1 millert 359: .Sh FILES
360: .Bl -tag -width Pa -compact
361: .It Pa /var/tmp/sort.*
1.3 aaron 362: default temporary directories
1.1 millert 363: .It Pa Ar output Ns #PID
1.3 aaron 364: temporary name for
1.1 millert 365: .Ar output
366: if
367: .Ar output
1.3 aaron 368: already exists
1.1 millert 369: .El
370: .Sh SEE ALSO
371: .Xr comm 1 ,
1.3 aaron 372: .Xr join 1 ,
1.18 fgsch 373: .Xr uniq 1 ,
374: .Xr radixsort 3
1.1 millert 375: .Sh HISTORY
376: A
1.8 aaron 377: .Nm
1.1 millert 378: command appeared in
1.16 mickey 379: .At v3 .
1.1 millert 380: .Sh NOTES
1.14 ericj 381: .Nm
382: has no limits on input line length (other than imposed by available
383: memory) or any restrictions on bytes allowed within lines.
384: .Pp
385: To protect data
386: .Nm
387: .Fl o
388: calls
389: .Xr link 2
390: and
391: .Xr unlink 2 ,
392: and thus fails on protected directories.
393: .Pp
1.1 millert 394: The current sort command uses lexicographic radix sorting, which requires
1.12 aaron 395: that sort keys be kept in memory (as opposed to previous versions which
396: used quick and merge sorts and did not).
1.1 millert 397: Thus performance depends highly on efficient choice of sort keys, and the
398: .Fl b
399: option and the
400: .Ar field2
401: argument of the
402: .Fl k
403: option should be used whenever possible.
404: Similarly,
1.8 aaron 405: .Nm
1.1 millert 406: .Fl k1f
407: is equivalent to
1.8 aaron 408: .Nm
1.1 millert 409: .Fl f
410: and may take twice as long.
1.12 aaron 411: .Sh BUGS
412: To sort files larger than 60Mb, use
413: .Nm
414: .Fl H ;
415: files larger than 704Mb must be sorted in smaller pieces, then merged.