Annotation of src/usr.bin/sort/sort.1, Revision 1.7
1.7 ! aaron 1: .\" $OpenBSD: sort.1,v 1.6 2000/01/05 07:40:43 pjanzen Exp $
1.1 millert 2: .\"
3: .\" Copyright (c) 1991, 1993
4: .\" The Regents of the University of California. All rights reserved.
5: .\"
6: .\" This code is derived from software contributed to Berkeley by
7: .\" the Institute of Electrical and Electronics Engineers, Inc.
8: .\"
9: .\" Redistribution and use in source and binary forms, with or without
10: .\" modification, are permitted provided that the following conditions
11: .\" are met:
12: .\" 1. Redistributions of source code must retain the above copyright
13: .\" notice, this list of conditions and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
17: .\" 3. All advertising materials mentioning features or use of this software
18: .\" must display the following acknowledgement:
19: .\" This product includes software developed by the University of
20: .\" California, Berkeley and its contributors.
21: .\" 4. Neither the name of the University nor the names of its contributors
22: .\" may be used to endorse or promote products derived from this software
23: .\" without specific prior written permission.
24: .\"
25: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
26: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
27: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
28: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
29: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
30: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
31: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
32: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
33: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
34: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
35: .\" SUCH DAMAGE.
36: .\"
37: .\" @(#)sort.1 8.1 (Berkeley) 6/6/93
38: .\"
39: .Dd June 6, 1993
40: .Dt SORT 1
41: .Os
42: .Sh NAME
43: .Nm sort
44: .Nd sort or merge text files
45: .Sh SYNOPSIS
46: .Nm sort
1.2 deraadt 47: .Op Fl cmubdfinrH
1.1 millert 48: .Op Fl t Ar char
49: .Op Fl R Ar char
50: .Oo
51: .Cm Fl k Ar field1[,field2]
52: .Oc
53: .Ar ...
54: .Op Fl T Ar dir
55: .Op Fl o Ar output
56: .Op Ar file
57: .Ar ...
58: .Sh DESCRIPTION
59: The
60: .Nm sort
61: utility
62: sorts text files by lines.
63: Comparisons are based on one or more sort keys extracted
64: from each line of input, and are performed
65: lexicographically. By default, if keys are not given,
66: .Nm sort
67: regards each input line as a single field.
68: .Pp
1.7 ! aaron 69: The options are as follows:
1.3 aaron 70: .Bl -tag -width file indent
1.1 millert 71: .It Fl c
72: Check that the single input file is sorted.
73: If the file is not sorted,
74: .Nm sort
75: produces the appropriate error messages and exits with code 1;
76: otherwise,
77: .Nm sort
78: returns 0.
1.3 aaron 79: .Nm sort
1.1 millert 80: .Fl c
1.6 pjanzen 81: produces no output, except the error messages on
82: .Em stderr .
1.1 millert 83: .It Fl m
84: Merge only; the input files are assumed to be pre-sorted.
85: .It Fl o Ar output
86: The argument given is the name of an
87: .Ar output
88: file to
89: be used instead of the standard output.
90: This file
91: can be the same as one of the input files.
92: .It Fl T Ar dir
93: Use
94: .Ar dir
95: as the directory for temporary files. The default is the contents
96: of the environment variable
97: .Ev TMPDIR
98: or
99: .Pa /var/tmp
100: if
101: .Ev TMPDIR
102: does not exist.
103: .It Fl u
104: Unique: suppress all but one in each set of lines
105: having equal keys.
106: If used with the
107: .Fl c
108: option,
109: check that there are no lines with duplicate keys.
110: .El
111: .Pp
112: The following options override the default ordering rules.
113: When ordering options appear independent of key field
114: specifications, the requested field ordering rules are
115: applied globally to all sort keys.
116: When attached to a specific key (see
117: .Fl k ) ,
118: the ordering options override
119: all global ordering options for that key.
120: .Bl -tag -width indent
121: .It Fl d
122: Only blank space and alphanumeric characters
123: .\" according
124: .\" to the current setting of LC_CTYPE
125: are used
126: in making comparisons.
127: .It Fl f
128: Considers all lowercase characters that have uppercase
129: equivalents to be the same for purposes of
130: comparison.
131: .It Fl i
132: Ignore all non-printable characters.
133: .It Fl n
134: An initial numeric string, consisting of optional
135: blank space, optional minus sign, and zero or more
136: digits (including decimal point)
137: .\" with
138: .\" optional radix character and thousands
139: .\" separator
140: .\" (as defined in the current locale),
141: is sorted by arithmetic value.
142: (The
143: .Fl n
144: option no longer implies
145: the
146: .Fl b
147: option.)
148: .It Fl r
149: Reverse the sense of comparisons.
150: .It Fl H
151: Use a merge sort instead of a radix sort. This option should be
152: used for files larger than 60Mb.
153: .El
154: .Pp
1.3 aaron 155: The treatment of field separators can be altered using these
1.1 millert 156: options:
157: .Bl -tag -width indent
158: .It Fl b
159: Ignores leading blank space when determining the start
160: and end of a restricted sort key.
161: A
162: .Fl b
163: option specified before the first
164: .Fl k
165: option applies globally to all
166: .Fl k
167: options.
168: Otherwise, the
169: .Fl b
170: option can be
171: attached independently to each
172: .Ar field
173: argument of the
174: .Fl k
175: option (see below).
176: Note that the
177: .Fl b
178: option
179: has no effect unless key fields are specified.
180: .It Fl t Ar char
1.3 aaron 181: .Ar char
1.1 millert 182: is used as the field separator character. The initial
183: .Ar char
184: is not considered to be part of a field when determining
1.6 pjanzen 185: key offsets.
1.1 millert 186: Each occurrence of
187: .Ar char
188: is significant (for example,
189: .Dq Ar charchar
190: delimits an empty field).
191: If
192: .Fl t
1.6 pjanzen 193: is not specified, the default field separator is a sequence of
194: blank-space characters, and consecutive blank spaces do
195: .Em not
196: delimit an empty field; further, the initial blank space
197: .Em is
198: considered part of a field when determining key offsets.
1.1 millert 199: .It Fl R Ar char
1.3 aaron 200: .Ar char
1.1 millert 201: is used as the record separator character.
202: This should be used with discretion;
203: .Fl R Ar <alphanumeric>
204: usually produces undesirable results.
1.4 aaron 205: The default record separator is newline.
1.1 millert 206: .It Fl k Ar field1[,field2]
207: Designates the starting position,
208: .Ar field1 ,
1.5 aaron 209: and optional ending position,
1.1 millert 210: .Ar field2 ,
211: of a key field.
212: The
213: .Fl k
214: option replaces the obsolescent options
215: .Cm \(pl Ns Ar pos1
216: and
217: .Fl Ns Ar pos2 .
218: .El
219: .Pp
220: The following operands are available:
221: .Bl -tag -width indent
1.3 aaron 222: .It Ar file
223: The pathname of a file to be sorted, merged, or checked.
224: If no
1.1 millert 225: .Ar file
226: operands are specified, or if
1.3 aaron 227: a
228: .Ar file
229: operand is
1.1 millert 230: .Fl ,
231: the standard input is used.
1.3 aaron 232: .El
1.1 millert 233: .Pp
234: A field is
1.6 pjanzen 235: defined as a maximal sequence of characters other than the
236: field separator and record separator
237: .Pq newline by default .
238: Initial blank spaces are included in the field unless
239: .Fl b
240: has been specified;
241: the first blank space of a sequence of blank spaces acts as the field
242: separator and is included in the field (unless
243: .Fl t
244: is specified).
245: For example, by default all blank spaces at the beginning of a line are
246: considered to be part of the first field.
1.1 millert 247: .Pp
248: Fields are specified
249: by the
250: .Fl k Ar field1[,field2]
251: argument. A missing
252: .Ar field2
253: argument defaults to the end of a line.
254: .Pp
255: The arguments
256: .Ar field1
257: and
258: .Ar field2
259: have the form
260: .Em m.n
1.6 pjanzen 261: .Em (m,n > 0)
262: and can be followed by one or more of the letters
263: .Cm b , d , f , i ,
264: .Cm n ,
265: and
266: .Cm r ,
267: which correspond to the options discussed above.
1.1 millert 268: A
269: .Ar field1
270: position specified by
271: .Em m.n
272: is interpreted as the
273: .Em n Ns th
1.6 pjanzen 274: character from the beginning of the
1.1 millert 275: .Em m Ns th
276: field.
277: A missing
278: .Em \&.n
279: in
280: .Ar field1
281: means
282: .Ql \&.1 ,
283: indicating the first character of the
284: .Em m Ns th
285: field;
1.3 aaron 286: if the
1.1 millert 287: .Fl b
288: option is in effect,
289: .Em n
290: is counted from the first
291: non-blank character in the
292: .Em m Ns th
293: field;
294: .Em m Ns \&.1b
295: refers to the first
296: non-blank character in the
297: .Em m Ns th
298: field.
1.6 pjanzen 299: .No 1\&. Ns Em n
300: refers to the
301: .Em n Ns th
302: character from the beginning of the line;
303: if
304: .Em n
305: is greater than the length of the line, the field is taken to be empty.
1.1 millert 306: .Pp
307: A
308: .Ar field2
309: position specified by
310: .Em m.n
311: is interpreted as
312: the
313: .Em n Ns th
314: character (including separators) of the
315: .Em m Ns th
316: field.
317: A missing
318: .Em \&.n
1.5 aaron 319: indicates the last character of the
1.1 millert 320: .Em m Ns th
321: field;
1.5 aaron 322: .Em m
1.1 millert 323: = \&0
324: designates the end of a line.
325: Thus the option
326: .Fl k Ar v.x,w.y
327: is synonymous with the obsolescent option
328: .Cm \(pl Ns Ar v-\&1.x-\&1
329: .Fl Ns Ar w-\&1.y ;
330: when
331: .Em y
332: is omitted,
333: .Fl k Ar v.x,w
334: is synonymous with
1.5 aaron 335: .Cm \(pl Ns Ar v-\&1.x-\&1
1.1 millert 336: .Fl Ns Ar w+1.0 .
337: The obsolescent
338: .Cm \(pl Ns Ar pos1
339: .Fl Ns Ar pos2
340: option is still supported, except for
1.3 aaron 341: .Fl Ns Ar w\&.0b ,
1.1 millert 342: which has no
343: .Fl k
344: equivalent.
345: .Sh ENVIRONMENT
346: If the following environment variable exists, it is utilized by
1.3 aaron 347: .Nm sort :
1.1 millert 348: .Bl -tag -width Fl
349: .It Ev TMPDIR
1.3 aaron 350: Path in which to store temporary files.
351: Note that
1.1 millert 352: .Ev TMPDIR
353: may be overridden by the
354: .Fl T
355: option.
356: .Sh FILES
357: .Bl -tag -width Pa -compact
358: .It Pa /var/tmp/sort.*
1.3 aaron 359: default temporary directories
1.1 millert 360: .It Pa Ar output Ns #PID
1.3 aaron 361: temporary name for
1.1 millert 362: .Ar output
363: if
364: .Ar output
1.3 aaron 365: already exists
1.1 millert 366: .El
367: .Sh SEE ALSO
368: .Xr comm 1 ,
1.3 aaron 369: .Xr join 1 ,
370: .Xr uniq 1
1.1 millert 371: .Sh RETURN VALUES
1.3 aaron 372: .Nm sort
373: exits with one of the following values:
374: .Pp
1.1 millert 375: .Bl -tag -width flag -compact
1.3 aaron 376: .It 0
377: Normal behavior.
378: .It 1
379: On disorder (or non-uniqueness) with the
1.1 millert 380: .Fl c
1.3 aaron 381: option.
382: .It 2
383: An error occurred.
1.1 millert 384: .Sh BUGS
385: Lines longer than 65522 characters are discarded and processing continues.
386: To sort files larger than 60Mb, use
387: .Nm sort
388: .Fl H ;
389: files larger than 704Mb must be sorted in smaller pieces, then merged.
390: To protect data
391: .Nm sort
392: .Fl o
393: calls link and unlink, and thus fails in protected directories.
394: .Sh HISTORY
395: A
396: .Nm sort
397: command appeared in
398: .At v6 .
399: .Sh NOTES
400: The current sort command uses lexicographic radix sorting, which requires
401: that sort keys be kept in memory (as opposed to previous versions which used quick
1.3 aaron 402: and merge sorts and did not).
1.1 millert 403: Thus performance depends highly on efficient choice of sort keys, and the
404: .Fl b
405: option and the
406: .Ar field2
407: argument of the
408: .Fl k
409: option should be used whenever possible.
410: Similarly,
411: .Nm sort
412: .Fl k1f
413: is equivalent to
414: .Nm sort
415: .Fl f
416: and may take twice as long.