Annotation of src/usr.bin/sort/sort.1, Revision 1.4
1.4 ! aaron 1: .\" $OpenBSD: sort.1,v 1.3 1998/09/27 16:57:54 aaron Exp $
1.1 millert 2: .\"
3: .\" Copyright (c) 1991, 1993
4: .\" The Regents of the University of California. All rights reserved.
5: .\"
6: .\" This code is derived from software contributed to Berkeley by
7: .\" the Institute of Electrical and Electronics Engineers, Inc.
8: .\"
9: .\" Redistribution and use in source and binary forms, with or without
10: .\" modification, are permitted provided that the following conditions
11: .\" are met:
12: .\" 1. Redistributions of source code must retain the above copyright
13: .\" notice, this list of conditions and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
17: .\" 3. All advertising materials mentioning features or use of this software
18: .\" must display the following acknowledgement:
19: .\" This product includes software developed by the University of
20: .\" California, Berkeley and its contributors.
21: .\" 4. Neither the name of the University nor the names of its contributors
22: .\" may be used to endorse or promote products derived from this software
23: .\" without specific prior written permission.
24: .\"
25: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
26: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
27: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
28: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
29: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
30: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
31: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
32: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
33: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
34: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
35: .\" SUCH DAMAGE.
36: .\"
37: .\" @(#)sort.1 8.1 (Berkeley) 6/6/93
38: .\"
39: .Dd June 6, 1993
40: .Dt SORT 1
41: .Os
42: .Sh NAME
43: .Nm sort
44: .Nd sort or merge text files
45: .Sh SYNOPSIS
46: .Nm sort
1.2 deraadt 47: .Op Fl cmubdfinrH
1.1 millert 48: .Op Fl t Ar char
49: .Op Fl R Ar char
50: .Oo
51: .Cm Fl k Ar field1[,field2]
52: .Oc
53: .Ar ...
54: .Op Fl T Ar dir
55: .Op Fl o Ar output
56: .Op Ar file
57: .Ar ...
58: .Sh DESCRIPTION
59: The
60: .Nm sort
61: utility
62: sorts text files by lines.
63: Comparisons are based on one or more sort keys extracted
64: from each line of input, and are performed
65: lexicographically. By default, if keys are not given,
66: .Nm sort
67: regards each input line as a single field.
68: .Pp
69: The following options are available:
1.3 aaron 70: .Bl -tag -width file indent
1.1 millert 71: .It Fl c
72: Check that the single input file is sorted.
73: If the file is not sorted,
74: .Nm sort
75: produces the appropriate error messages and exits with code 1;
76: otherwise,
77: .Nm sort
78: returns 0.
1.3 aaron 79: .Nm sort
1.1 millert 80: .Fl c
81: produces no output.
82: .It Fl m
83: Merge only; the input files are assumed to be pre-sorted.
84: .It Fl o Ar output
85: The argument given is the name of an
86: .Ar output
87: file to
88: be used instead of the standard output.
89: This file
90: can be the same as one of the input files.
91: .It Fl T Ar dir
92: Use
93: .Ar dir
94: as the directory for temporary files. The default is the contents
95: of the environment variable
96: .Ev TMPDIR
97: or
98: .Pa /var/tmp
99: if
100: .Ev TMPDIR
101: does not exist.
102: .It Fl u
103: Unique: suppress all but one in each set of lines
104: having equal keys.
105: If used with the
106: .Fl c
107: option,
108: check that there are no lines with duplicate keys.
109: .El
110: .Pp
111: The following options override the default ordering rules.
112: When ordering options appear independent of key field
113: specifications, the requested field ordering rules are
114: applied globally to all sort keys.
115: When attached to a specific key (see
116: .Fl k ) ,
117: the ordering options override
118: all global ordering options for that key.
119: .Bl -tag -width indent
120: .It Fl d
121: Only blank space and alphanumeric characters
122: .\" according
123: .\" to the current setting of LC_CTYPE
124: are used
125: in making comparisons.
126: .It Fl f
127: Considers all lowercase characters that have uppercase
128: equivalents to be the same for purposes of
129: comparison.
130: .It Fl i
131: Ignore all non-printable characters.
132: .It Fl n
133: An initial numeric string, consisting of optional
134: blank space, optional minus sign, and zero or more
135: digits (including decimal point)
136: .\" with
137: .\" optional radix character and thousands
138: .\" separator
139: .\" (as defined in the current locale),
140: is sorted by arithmetic value.
141: (The
142: .Fl n
143: option no longer implies
144: the
145: .Fl b
146: option.)
147: .It Fl r
148: Reverse the sense of comparisons.
149: .It Fl H
150: Use a merge sort instead of a radix sort. This option should be
151: used for files larger than 60Mb.
152: .El
153: .Pp
1.3 aaron 154: The treatment of field separators can be altered using these
1.1 millert 155: options:
156: .Bl -tag -width indent
157: .It Fl b
158: Ignores leading blank space when determining the start
159: and end of a restricted sort key.
160: A
161: .Fl b
162: option specified before the first
163: .Fl k
164: option applies globally to all
165: .Fl k
166: options.
167: Otherwise, the
168: .Fl b
169: option can be
170: attached independently to each
171: .Ar field
172: argument of the
173: .Fl k
174: option (see below).
175: Note that the
176: .Fl b
177: option
178: has no effect unless key fields are specified.
179: .It Fl t Ar char
1.3 aaron 180: .Ar char
1.1 millert 181: is used as the field separator character. The initial
182: .Ar char
183: is not considered to be part of a field when determining
184: key offsets (see below).
185: Each occurrence of
186: .Ar char
187: is significant (for example,
188: .Dq Ar charchar
189: delimits an empty field).
190: If
191: .Fl t
192: is not specified,
193: blank space characters are used as default field
194: separators.
195: .It Fl R Ar char
1.3 aaron 196: .Ar char
1.1 millert 197: is used as the record separator character.
198: This should be used with discretion;
199: .Fl R Ar <alphanumeric>
200: usually produces undesirable results.
1.4 ! aaron 201: The default record separator is newline.
1.1 millert 202: .It Fl k Ar field1[,field2]
203: Designates the starting position,
204: .Ar field1 ,
205: and optional ending position,
206: .Ar field2 ,
207: of a key field.
208: The
209: .Fl k
210: option replaces the obsolescent options
211: .Cm \(pl Ns Ar pos1
212: and
213: .Fl Ns Ar pos2 .
214: .El
215: .Pp
216: The following operands are available:
217: .Bl -tag -width indent
1.3 aaron 218: .It Ar file
219: The pathname of a file to be sorted, merged, or checked.
220: If no
1.1 millert 221: .Ar file
222: operands are specified, or if
1.3 aaron 223: a
224: .Ar file
225: operand is
1.1 millert 226: .Fl ,
227: the standard input is used.
1.3 aaron 228: .El
1.1 millert 229: .Pp
230: A field is
231: defined as a minimal sequence of characters followed by a
232: field separator or a newline character.
233: By default, the first
234: blank space of a sequence of blank spaces acts as the field separator.
235: All blank spaces in a sequence of blank spaces are considered
236: as part of the next field; for example, all blank spaces at
237: the beginning of a line are considered to be part of the
238: first field.
239: .Pp
240: Fields are specified
241: by the
242: .Fl k Ar field1[,field2]
243: argument. A missing
244: .Ar field2
245: argument defaults to the end of a line.
246: .Pp
247: The arguments
248: .Ar field1
249: and
250: .Ar field2
251: have the form
252: .Em m.n
253: followed by one or more of the options
254: .Fl b , d , f , i ,
255: .Fl n , r .
256: A
257: .Ar field1
258: position specified by
259: .Em m.n
260: .Em (m,n > 0)
261: is interpreted as the
262: .Em n Ns th
263: character in the
264: .Em m Ns th
265: field.
266: A missing
267: .Em \&.n
268: in
269: .Ar field1
270: means
271: .Ql \&.1 ,
272: indicating the first character of the
273: .Em m Ns th
274: field;
1.3 aaron 275: if the
1.1 millert 276: .Fl b
277: option is in effect,
278: .Em n
279: is counted from the first
280: non-blank character in the
281: .Em m Ns th
282: field;
283: .Em m Ns \&.1b
284: refers to the first
285: non-blank character in the
286: .Em m Ns th
287: field.
288: .Pp
289: A
290: .Ar field2
291: position specified by
292: .Em m.n
293: is interpreted as
294: the
295: .Em n Ns th
296: character (including separators) of the
297: .Em m Ns th
298: field.
299: A missing
300: .Em \&.n
301: indicates the last character of the
302: .Em m Ns th
303: field;
304: .Em m
305: = \&0
306: designates the end of a line.
307: Thus the option
308: .Fl k Ar v.x,w.y
309: is synonymous with the obsolescent option
310: .Cm \(pl Ns Ar v-\&1.x-\&1
311: .Fl Ns Ar w-\&1.y ;
312: when
313: .Em y
314: is omitted,
315: .Fl k Ar v.x,w
316: is synonymous with
317: .Cm \(pl Ns Ar v-\&1.x-\&1
318: .Fl Ns Ar w+1.0 .
319: The obsolescent
320: .Cm \(pl Ns Ar pos1
321: .Fl Ns Ar pos2
322: option is still supported, except for
1.3 aaron 323: .Fl Ns Ar w\&.0b ,
1.1 millert 324: which has no
325: .Fl k
326: equivalent.
327: .Sh ENVIRONMENT
328: If the following environment variable exists, it is utilized by
1.3 aaron 329: .Nm sort :
1.1 millert 330: .Bl -tag -width Fl
331: .It Ev TMPDIR
1.3 aaron 332: Path in which to store temporary files.
333: Note that
1.1 millert 334: .Ev TMPDIR
335: may be overridden by the
336: .Fl T
337: option.
338: .Sh FILES
339: .Bl -tag -width Pa -compact
340: .It Pa /var/tmp/sort.*
1.3 aaron 341: default temporary directories
1.1 millert 342: .It Pa Ar output Ns #PID
1.3 aaron 343: temporary name for
1.1 millert 344: .Ar output
345: if
346: .Ar output
1.3 aaron 347: already exists
1.1 millert 348: .El
349: .Sh SEE ALSO
350: .Xr comm 1 ,
1.3 aaron 351: .Xr join 1 ,
352: .Xr uniq 1
1.1 millert 353: .Sh RETURN VALUES
1.3 aaron 354: .Nm sort
355: exits with one of the following values:
356: .Pp
1.1 millert 357: .Bl -tag -width flag -compact
1.3 aaron 358: .It 0
359: Normal behavior.
360: .It 1
361: On disorder (or non-uniqueness) with the
1.1 millert 362: .Fl c
1.3 aaron 363: option.
364: .It 2
365: An error occurred.
1.1 millert 366: .Sh BUGS
367: Lines longer than 65522 characters are discarded and processing continues.
368: To sort files larger than 60Mb, use
369: .Nm sort
370: .Fl H ;
371: files larger than 704Mb must be sorted in smaller pieces, then merged.
372: To protect data
373: .Nm sort
374: .Fl o
375: calls link and unlink, and thus fails in protected directories.
376: .Sh HISTORY
377: A
378: .Nm sort
379: command appeared in
380: .At v6 .
381: .Sh NOTES
382: The current sort command uses lexicographic radix sorting, which requires
383: that sort keys be kept in memory (as opposed to previous versions which used quick
1.3 aaron 384: and merge sorts and did not).
1.1 millert 385: Thus performance depends highly on efficient choice of sort keys, and the
386: .Fl b
387: option and the
388: .Ar field2
389: argument of the
390: .Fl k
391: option should be used whenever possible.
392: Similarly,
393: .Nm sort
394: .Fl k1f
395: is equivalent to
396: .Nm sort
397: .Fl f
398: and may take twice as long.