Annotation of src/usr.bin/sort/sort.1, Revision 1.14
1.14 ! ericj 1: .\" $OpenBSD: sort.1,v 1.13 2000/11/09 17:52:39 aaron Exp $
1.1 millert 2: .\"
3: .\" Copyright (c) 1991, 1993
4: .\" The Regents of the University of California. All rights reserved.
5: .\"
6: .\" This code is derived from software contributed to Berkeley by
7: .\" the Institute of Electrical and Electronics Engineers, Inc.
8: .\"
9: .\" Redistribution and use in source and binary forms, with or without
10: .\" modification, are permitted provided that the following conditions
11: .\" are met:
12: .\" 1. Redistributions of source code must retain the above copyright
13: .\" notice, this list of conditions and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
17: .\" 3. All advertising materials mentioning features or use of this software
18: .\" must display the following acknowledgement:
19: .\" This product includes software developed by the University of
20: .\" California, Berkeley and its contributors.
21: .\" 4. Neither the name of the University nor the names of its contributors
22: .\" may be used to endorse or promote products derived from this software
23: .\" without specific prior written permission.
24: .\"
25: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
26: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
27: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
28: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
29: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
30: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
31: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
32: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
33: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
34: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
35: .\" SUCH DAMAGE.
36: .\"
37: .\" @(#)sort.1 8.1 (Berkeley) 6/6/93
38: .\"
39: .Dd June 6, 1993
40: .Dt SORT 1
41: .Os
42: .Sh NAME
43: .Nm sort
44: .Nd sort or merge text files
45: .Sh SYNOPSIS
46: .Nm sort
1.2 deraadt 47: .Op Fl cmubdfinrH
1.1 millert 48: .Op Fl t Ar char
49: .Op Fl R Ar char
50: .Oo
51: .Cm Fl k Ar field1[,field2]
52: .Oc
53: .Ar ...
54: .Op Fl T Ar dir
55: .Op Fl o Ar output
56: .Op Ar file
57: .Ar ...
58: .Sh DESCRIPTION
59: The
1.8 aaron 60: .Nm
1.12 aaron 61: utility sorts text files by lines.
1.1 millert 62: Comparisons are based on one or more sort keys extracted
1.8 aaron 63: from each line of input, and are performed lexicographically.
64: By default, if keys are not given,
65: .Nm
1.1 millert 66: regards each input line as a single field.
67: .Pp
1.7 aaron 68: The options are as follows:
1.13 aaron 69: .Bl -tag -width file Ds
1.1 millert 70: .It Fl c
71: Check that the single input file is sorted.
72: If the file is not sorted,
1.8 aaron 73: .Nm
1.12 aaron 74: produces the appropriate error messages and exits with code 1; otherwise,
1.8 aaron 75: .Nm
1.1 millert 76: returns 0.
1.8 aaron 77: .Nm
1.1 millert 78: .Fl c
1.6 pjanzen 79: produces no output, except the error messages on
80: .Em stderr .
1.1 millert 81: .It Fl m
82: Merge only; the input files are assumed to be pre-sorted.
83: .It Fl o Ar output
84: The argument given is the name of an
85: .Ar output
1.12 aaron 86: file to be used instead of the standard output.
87: This file can be the same as one of the input files.
1.1 millert 88: .It Fl T Ar dir
89: Use
90: .Ar dir
1.8 aaron 91: as the directory for temporary files.
92: The default is the contents of the environment variable
1.1 millert 93: .Ev TMPDIR
94: or
95: .Pa /var/tmp
96: if
97: .Ev TMPDIR
98: does not exist.
99: .It Fl u
1.12 aaron 100: Unique: suppress all but one in each set of lines having equal keys.
1.1 millert 101: If used with the
102: .Fl c
1.12 aaron 103: option, check that there are no lines with duplicate keys.
1.1 millert 104: .El
105: .Pp
106: The following options override the default ordering rules.
107: When ordering options appear independent of key field
108: specifications, the requested field ordering rules are
109: applied globally to all sort keys.
110: When attached to a specific key (see
111: .Fl k ) ,
112: the ordering options override
113: all global ordering options for that key.
114: .Bl -tag -width indent
115: .It Fl d
116: Only blank space and alphanumeric characters
117: .\" according
118: .\" to the current setting of LC_CTYPE
1.12 aaron 119: are used in making comparisons.
1.1 millert 120: .It Fl f
121: Considers all lowercase characters that have uppercase
1.12 aaron 122: equivalents to be the same for purposes of comparison.
1.1 millert 123: .It Fl i
124: Ignore all non-printable characters.
125: .It Fl n
1.12 aaron 126: An initial numeric string, consisting of optional blank space, optional
127: minus sign, and zero or more digits (including decimal point)
1.1 millert 128: .\" with
129: .\" optional radix character and thousands
130: .\" separator
131: .\" (as defined in the current locale),
132: is sorted by arithmetic value.
133: (The
134: .Fl n
1.12 aaron 135: option no longer implies the
1.1 millert 136: .Fl b
137: option.)
138: .It Fl r
139: Reverse the sense of comparisons.
140: .It Fl H
1.8 aaron 141: Use a merge sort instead of a radix sort.
142: This options should be used for files larger than 60Mb.
1.1 millert 143: .El
144: .Pp
1.12 aaron 145: The treatment of field separators can be altered using these options:
1.1 millert 146: .Bl -tag -width indent
147: .It Fl b
148: Ignores leading blank space when determining the start
149: and end of a restricted sort key.
150: A
151: .Fl b
152: option specified before the first
153: .Fl k
154: option applies globally to all
155: .Fl k
156: options.
157: Otherwise, the
158: .Fl b
1.12 aaron 159: option can be attached independently to each
1.1 millert 160: .Ar field
161: argument of the
162: .Fl k
163: option (see below).
164: Note that the
165: .Fl b
1.12 aaron 166: option has no effect unless key fields are specified.
1.1 millert 167: .It Fl t Ar char
1.3 aaron 168: .Ar char
1.8 aaron 169: is used as the field separator character.
170: The initial
1.1 millert 171: .Ar char
1.12 aaron 172: is not considered to be part of a field when determining key offsets.
1.1 millert 173: Each occurrence of
174: .Ar char
175: is significant (for example,
176: .Dq Ar charchar
177: delimits an empty field).
178: If
179: .Fl t
1.6 pjanzen 180: is not specified, the default field separator is a sequence of
181: blank-space characters, and consecutive blank spaces do
182: .Em not
183: delimit an empty field; further, the initial blank space
184: .Em is
185: considered part of a field when determining key offsets.
1.1 millert 186: .It Fl R Ar char
1.3 aaron 187: .Ar char
1.1 millert 188: is used as the record separator character.
189: This should be used with discretion;
190: .Fl R Ar <alphanumeric>
191: usually produces undesirable results.
1.4 aaron 192: The default record separator is newline.
1.1 millert 193: .It Fl k Ar field1[,field2]
194: Designates the starting position,
195: .Ar field1 ,
1.5 aaron 196: and optional ending position,
1.1 millert 197: .Ar field2 ,
198: of a key field.
199: The
200: .Fl k
201: option replaces the obsolescent options
202: .Cm \(pl Ns Ar pos1
203: and
204: .Fl Ns Ar pos2 .
205: .El
206: .Pp
207: The following operands are available:
208: .Bl -tag -width indent
1.3 aaron 209: .It Ar file
210: The pathname of a file to be sorted, merged, or checked.
211: If no
1.1 millert 212: .Ar file
1.12 aaron 213: operands are specified, or if a
1.3 aaron 214: .Ar file
215: operand is
1.1 millert 216: .Fl ,
217: the standard input is used.
1.3 aaron 218: .El
1.1 millert 219: .Pp
1.12 aaron 220: A field is defined as a maximal sequence of characters other than the
1.6 pjanzen 221: field separator and record separator
222: .Pq newline by default .
223: Initial blank spaces are included in the field unless
224: .Fl b
225: has been specified;
226: the first blank space of a sequence of blank spaces acts as the field
227: separator and is included in the field (unless
228: .Fl t
229: is specified).
230: For example, by default all blank spaces at the beginning of a line are
231: considered to be part of the first field.
1.1 millert 232: .Pp
1.12 aaron 233: Fields are specified by the
1.1 millert 234: .Fl k Ar field1[,field2]
1.8 aaron 235: argument.
236: A missing
1.1 millert 237: .Ar field2
238: argument defaults to the end of a line.
239: .Pp
240: The arguments
241: .Ar field1
242: and
243: .Ar field2
244: have the form
245: .Em m.n
1.6 pjanzen 246: .Em (m,n > 0)
247: and can be followed by one or more of the letters
248: .Cm b , d , f , i ,
1.10 aaron 249: .Cm n ,
1.6 pjanzen 250: and
251: .Cm r ,
252: which correspond to the options discussed above.
1.1 millert 253: A
254: .Ar field1
255: position specified by
256: .Em m.n
257: is interpreted as the
258: .Em n Ns th
1.6 pjanzen 259: character from the beginning of the
1.1 millert 260: .Em m Ns th
261: field.
262: A missing
263: .Em \&.n
264: in
265: .Ar field1
266: means
267: .Ql \&.1 ,
268: indicating the first character of the
269: .Em m Ns th
1.12 aaron 270: field; if the
1.1 millert 271: .Fl b
272: option is in effect,
273: .Em n
1.12 aaron 274: is counted from the first non-blank character in the
1.1 millert 275: .Em m Ns th
276: field;
277: .Em m Ns \&.1b
1.12 aaron 278: refers to the first non-blank character in the
1.1 millert 279: .Em m Ns th
280: field.
1.6 pjanzen 281: .No 1\&. Ns Em n
282: refers to the
283: .Em n Ns th
284: character from the beginning of the line;
285: if
286: .Em n
287: is greater than the length of the line, the field is taken to be empty.
1.1 millert 288: .Pp
289: A
290: .Ar field2
291: position specified by
292: .Em m.n
1.12 aaron 293: is interpreted as the
1.1 millert 294: .Em n Ns th
295: character (including separators) of the
296: .Em m Ns th
297: field.
298: A missing
299: .Em \&.n
1.5 aaron 300: indicates the last character of the
1.1 millert 301: .Em m Ns th
302: field;
1.5 aaron 303: .Em m
1.1 millert 304: = \&0
305: designates the end of a line.
306: Thus the option
307: .Fl k Ar v.x,w.y
308: is synonymous with the obsolescent option
309: .Cm \(pl Ns Ar v-\&1.x-\&1
310: .Fl Ns Ar w-\&1.y ;
311: when
312: .Em y
313: is omitted,
314: .Fl k Ar v.x,w
315: is synonymous with
1.5 aaron 316: .Cm \(pl Ns Ar v-\&1.x-\&1
1.1 millert 317: .Fl Ns Ar w+1.0 .
318: The obsolescent
319: .Cm \(pl Ns Ar pos1
320: .Fl Ns Ar pos2
321: option is still supported, except for
1.3 aaron 322: .Fl Ns Ar w\&.0b ,
1.1 millert 323: which has no
324: .Fl k
325: equivalent.
1.8 aaron 326: .Pp
327: The
328: .Nm
329: utility shall exit with one of the following values:
330: .Pp
331: .Bl -tag -width flag -compact
332: .It 0
333: Normal behavior.
334: .It 1
335: On disorder (or non-uniqueness) with the
336: .Fl c
337: option.
338: .It 2
339: An error occurred.
340: .El
1.1 millert 341: .Sh ENVIRONMENT
1.8 aaron 342: The following environment variables affect the execution of
1.3 aaron 343: .Nm sort :
1.1 millert 344: .Bl -tag -width Fl
345: .It Ev TMPDIR
1.3 aaron 346: Path in which to store temporary files.
347: Note that
1.1 millert 348: .Ev TMPDIR
349: may be overridden by the
350: .Fl T
351: option.
1.11 aaron 352: .El
1.1 millert 353: .Sh FILES
354: .Bl -tag -width Pa -compact
355: .It Pa /var/tmp/sort.*
1.3 aaron 356: default temporary directories
1.1 millert 357: .It Pa Ar output Ns #PID
1.3 aaron 358: temporary name for
1.1 millert 359: .Ar output
360: if
361: .Ar output
1.3 aaron 362: already exists
1.1 millert 363: .El
364: .Sh SEE ALSO
365: .Xr comm 1 ,
1.3 aaron 366: .Xr join 1 ,
1.14 ! ericj 367: .Xr radixsort 3 ,
1.3 aaron 368: .Xr uniq 1
1.1 millert 369: .Sh HISTORY
370: A
1.8 aaron 371: .Nm
1.1 millert 372: command appeared in
1.9 aaron 373: .At v5 .
1.1 millert 374: .Sh NOTES
1.14 ! ericj 375: .Nm
! 376: has no limits on input line length (other than imposed by available
! 377: memory) or any restrictions on bytes allowed within lines.
! 378: .Pp
! 379: To protect data
! 380: .Nm
! 381: .Fl o
! 382: calls
! 383: .Xr link 2
! 384: and
! 385: .Xr unlink 2 ,
! 386: and thus fails on protected directories.
! 387: .Pp
1.1 millert 388: The current sort command uses lexicographic radix sorting, which requires
1.12 aaron 389: that sort keys be kept in memory (as opposed to previous versions which
390: used quick and merge sorts and did not).
1.1 millert 391: Thus performance depends highly on efficient choice of sort keys, and the
392: .Fl b
393: option and the
394: .Ar field2
395: argument of the
396: .Fl k
397: option should be used whenever possible.
398: Similarly,
1.8 aaron 399: .Nm
1.1 millert 400: .Fl k1f
401: is equivalent to
1.8 aaron 402: .Nm
1.1 millert 403: .Fl f
404: and may take twice as long.
1.12 aaron 405: .Sh BUGS
406: To sort files larger than 60Mb, use
407: .Nm
408: .Fl H ;
409: files larger than 704Mb must be sorted in smaller pieces, then merged.