Annotation of src/usr.bin/sort/sort.1, Revision 1.60
1.60 ! schwarze 1: .\" $OpenBSD: sort.1,v 1.59 2019/05/13 17:00:12 schwarze Exp $
1.1 millert 2: .\"
3: .\" Copyright (c) 1991, 1993
4: .\" The Regents of the University of California. All rights reserved.
5: .\"
6: .\" This code is derived from software contributed to Berkeley by
7: .\" the Institute of Electrical and Electronics Engineers, Inc.
8: .\"
9: .\" Redistribution and use in source and binary forms, with or without
10: .\" modification, are permitted provided that the following conditions
11: .\" are met:
12: .\" 1. Redistributions of source code must retain the above copyright
13: .\" notice, this list of conditions and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
1.20 millert 17: .\" 3. Neither the name of the University nor the names of its contributors
1.1 millert 18: .\" may be used to endorse or promote products derived from this software
19: .\" without specific prior written permission.
20: .\"
21: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
22: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
24: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
25: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
27: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
28: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
29: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
30: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31: .\" SUCH DAMAGE.
32: .\"
33: .\" @(#)sort.1 8.1 (Berkeley) 6/6/93
34: .\"
1.60 ! schwarze 35: .Dd $Mdocdate: May 13 2019 $
1.1 millert 36: .Dt SORT 1
37: .Os
38: .Sh NAME
39: .Nm sort
1.41 millert 40: .Nd sort, merge, or sequence check text and binary files
1.1 millert 41: .Sh SYNOPSIS
42: .Nm sort
1.43 jmc 43: .Op Fl bCcdfgHhiMmnRrsuVz
1.42 jmc 44: .Op Fl k Ar field1 Ns Op , Ns Ar field2
1.23 jmc 45: .Op Fl o Ar output
1.42 jmc 46: .Op Fl S Ar size
1.1 millert 47: .Op Fl T Ar dir
1.23 jmc 48: .Op Fl t Ar char
1.34 sobrado 49: .Op Ar
1.1 millert 50: .Sh DESCRIPTION
51: The
1.8 aaron 52: .Nm
1.41 millert 53: utility sorts text and binary files by lines.
54: A line is a record separated from the subsequent record by a
1.55 schwarze 55: newline (default) or NUL \'\\0\' character
56: .Po
57: .Fl z
58: option
59: .Pc .
1.41 millert 60: A record can contain any printable or unprintable characters.
61: Comparisons are based on one or more sort keys extracted from
62: each line of input, and are performed lexicographically,
1.60 ! schwarze 63: according to the specified command-line options
! 64: that can tune the actual sorting behavior.
1.8 aaron 65: By default, if keys are not given,
66: .Nm
1.41 millert 67: uses entire lines for comparison.
1.1 millert 68: .Pp
1.49 jmc 69: If no
70: .Ar file
71: is specified, or if
72: .Ar file
73: is
74: .Sq - ,
75: the standard input is used.
76: .Pp
1.7 aaron 77: The options are as follows:
1.21 jmc 78: .Bl -tag -width Ds
1.57 schwarze 79: .It Fl C , Fl Fl check Ns = Ns Cm silent Ns | Ns Cm quiet
1.35 schwarze 80: Check that the single input file is sorted.
81: If it is, exit 0; if it's not, exit 1.
82: In either case, produce no output.
1.57 schwarze 83: .It Fl c , Fl Fl check
1.35 schwarze 84: Like
85: .Fl C ,
1.37 jmc 86: but additionally write a message to
1.35 schwarze 87: .Em stderr
88: if the input file is not sorted.
1.41 millert 89: .It Fl m , Fl Fl merge
1.1 millert 90: Merge only; the input files are assumed to be pre-sorted.
1.41 millert 91: If they are not sorted, the output order is undefined.
92: .It Fl o Ar output , Fl Fl output Ns = Ns Ar output
93: Write the output to the
1.1 millert 94: .Ar output
1.41 millert 95: file instead of the standard output.
1.12 aaron 96: This file can be the same as one of the input files.
1.42 jmc 97: .It Fl S Ar size , Fl Fl buffer-size Ns = Ns Ar size
1.41 millert 98: Use a memory buffer no larger than
99: .Ar size .
100: The modifiers %, b, K, M, G, T, P, E, Z, and Y can be used.
101: If no memory limit is specified,
102: .Nm
103: may use up to about 90% of available memory.
104: If the input is too big to fit into the memory buffer,
105: temporary files are used.
1.42 jmc 106: .It Fl s
107: Stable sort; maintains the original record order of records that have
1.50 jmc 108: an equal key.
1.42 jmc 109: This is a non-standard feature, but it is widely accepted and used.
1.41 millert 110: .It Fl T Ar dir , Fl Fl temporary-directory Ns = Ns Ar dir
111: Store temporary files in the directory
112: .Ar dir .
113: The default path is the value of the environment variable
1.1 millert 114: .Ev TMPDIR
115: or
1.56 lteo 116: .Pa /tmp
1.1 millert 117: if
118: .Ev TMPDIR
1.41 millert 119: is not defined.
120: .It Fl u , Fl Fl unique
1.12 aaron 121: Unique: suppress all but one in each set of lines having equal keys.
1.41 millert 122: This option implies a stable sort (see below).
123: If used with
1.35 schwarze 124: .Fl C
125: or
1.41 millert 126: .Fl c ,
127: .Nm
128: also checks that there are no lines with duplicate keys.
1.38 jmc 129: .El
130: .Pp
1.1 millert 131: The following options override the default ordering rules.
1.37 jmc 132: If ordering options appear before the first
133: .Fl k
134: option, they apply globally to all sort keys.
1.1 millert 135: When attached to a specific key (see
136: .Fl k ) ,
1.41 millert 137: the ordering options override all global ordering options for that key.
1.37 jmc 138: Note that the ordering options intended to apply globally should not
139: appear after
140: .Fl k
141: or results may be unexpected.
1.1 millert 142: .Bl -tag -width indent
1.41 millert 143: .It Fl d , Fl Fl dictionary-order
144: Consider only blank spaces and alphanumeric characters in comparisons.
145: .It Fl f , Fl Fl ignore-case
146: Consider all lowercase characters that have uppercase
1.12 aaron 147: equivalents to be the same for purposes of comparison.
1.57 schwarze 148: .It Fl g , Fl Fl general-numeric-sort , Fl Fl sort Ns = Ns Cm general-numeric
1.41 millert 149: Sort by general numerical value.
150: As opposed to
151: .Fl n ,
1.50 jmc 152: this option handles general floating points.
153: It has a more
154: permissive format than that allowed by
155: .Fl n
1.41 millert 156: but it has a significant performance drawback.
1.57 schwarze 157: .It Fl h , Fl Fl human-numeric-sort , Fl Fl sort Ns = Ns Cm human-numeric
1.41 millert 158: Sort by numerical value, but take into account the SI suffix,
159: if present.
160: Sorts first by numeric sign (negative, zero, or
161: positive); then by SI suffix (either empty, or `k' or `K', or one
162: of `MGTPEZY', in that order); and finally by numeric value.
163: The SI suffix must immediately follow the number.
164: For example, '12345K' sorts before '1M', because M is "larger" than K.
165: This sort option is useful for sorting the output of a single invocation
166: of 'df' command with
167: .Fl h
168: or
169: .Fl H
170: options (human-readable).
171: .It Fl i , Fl Fl ignore-nonprinting
1.1 millert 172: Ignore all non-printable characters.
1.57 schwarze 173: .It Fl M , Fl Fl month-sort , Fl Fl sort Ns = Ns Cm month
1.41 millert 174: Sort by month abbreviations.
175: Unknown strings are considered smaller than valid month names.
1.57 schwarze 176: .It Fl n , Fl Fl numeric-sort , Fl Fl sort Ns = Ns Cm numeric
1.12 aaron 177: An initial numeric string, consisting of optional blank space, optional
178: minus sign, and zero or more digits (including decimal point)
1.1 millert 179: is sorted by arithmetic value.
1.41 millert 180: Leading blank characters are ignored.
1.57 schwarze 181: .It Fl R , Fl Fl random-sort , Fl Fl sort Ns = Ns Cm random
1.41 millert 182: Sort lines in random order.
183: This is a random permutation of the inputs with the exception that
184: equal keys sort together.
185: It is implemented by hashing the input keys and sorting the hash values.
186: The hash function is randomized with data from
1.47 jmc 187: .Xr arc4random_buf 3 ,
1.41 millert 188: or by file content if one is specified via
189: .Fl Fl random-source .
190: If multiple sort fields are specified,
191: the same random hash function is used for all of them.
192: .It Fl r , Fl Fl reverse
193: Sort in reverse order.
1.57 schwarze 194: .It Fl V , Fl Fl version-sort
1.41 millert 195: Sort version numbers.
196: The input lines are treated as file names in form
197: PREFIX VERSION SUFFIX, where SUFFIX matches the regular expression
198: "(\.([A-Za-z~][A-Za-z0-9~]*)?)*".
199: The files are compared by their prefixes and versions (leading
200: zeros are ignored in version numbers, see example below).
201: If an input string does not match the pattern, then it is compared
202: using the byte compare function.
1.44 jmc 203: .Pp
204: For example:
205: .Bd -literal -offset indent
206: $ ls sort* | sort -V
207: sort-1.022.tgz
208: sort-1.23.tgz
209: sort-1.23.1.tgz
210: sort-1.024.tgz
211: sort-1.024.003.
212: sort-1.024.003.tgz
213: sort-1.024.07.tgz
214: sort-1.024.009.tgz
215: .Ed
1.1 millert 216: .El
217: .Pp
1.12 aaron 218: The treatment of field separators can be altered using these options:
1.1 millert 219: .Bl -tag -width indent
1.41 millert 220: .It Fl b , Fl Fl ignore-leading-blanks
221: Ignore leading blank space when determining the start
222: and end of a restricted sort key (see
223: .Fl k ) .
224: If
1.1 millert 225: .Fl b
1.41 millert 226: is specified before the first
1.1 millert 227: .Fl k
1.41 millert 228: option, it applies globally to all key specifications.
229: Otherwise,
1.1 millert 230: .Fl b
1.41 millert 231: can be attached independently to each
1.1 millert 232: .Ar field
1.41 millert 233: argument of the key specifications.
1.53 millert 234: Note that
235: .Fl b
236: should not appear after
237: .Fl k ,
238: and that it has no effect unless key fields are specified.
1.41 millert 239: .It Xo
1.42 jmc 240: .Fl k Ar field1 Ns Op , Ns Ar field2 ,
241: .Fl Fl key Ns = Ns Ar field1 Ns Op , Ns Ar field2
1.41 millert 242: .Xc
243: Define a restricted sort key that has the starting position
244: .Ar field1 ,
245: and optional ending position
246: .Ar field2
247: of a key field.
248: The
249: .Fl k
250: option may be specified multiple times,
251: in which case subsequent keys are compared after earlier keys compare equal.
252: The
1.1 millert 253: .Fl k
1.41 millert 254: option replaces the obsolete options
255: .Cm \(pl Ns Ar pos1
256: and
257: .Fl Ns Ar pos2 ,
258: but the old notation is also supported.
259: .It Fl t Ar char , Fl Fl field-separator Ns = Ns Ar char
260: Use
1.3 aaron 261: .Ar char
1.41 millert 262: as the field separator character.
1.8 aaron 263: The initial
1.1 millert 264: .Ar char
1.12 aaron 265: is not considered to be part of a field when determining key offsets.
1.1 millert 266: Each occurrence of
267: .Ar char
268: is significant (for example,
269: .Dq Ar charchar
270: delimits an empty field).
271: If
272: .Fl t
1.6 pjanzen 273: is not specified, the default field separator is a sequence of
274: blank-space characters, and consecutive blank spaces do
275: .Em not
276: delimit an empty field; further, the initial blank space
277: .Em is
278: considered part of a field when determining key offsets.
1.41 millert 279: To use NUL as field separator, use
280: .Fl t
281: \'\\0\'.
282: .It Fl z , Fl Fl zero-terminated
283: Use NUL as the record separator.
284: By default, records in the files are expected to be separated by
285: the newline characters.
286: With this option, NUL (\'\\0\') is used as the record separator character.
1.37 jmc 287: .El
288: .Pp
1.41 millert 289: Other options:
1.37 jmc 290: .Bl -tag -width indent
1.41 millert 291: .It Fl Fl batch-size Ns = Ns Ar num
292: Specify maximum number of files that can be opened by
293: .Nm
294: at once.
295: This option affects behavior when having many input files or using
296: temporary files.
1.51 millert 297: The minimum value is 2.
1.41 millert 298: The default value is 16.
299: .It Fl Fl compress-program Ns = Ns Ar program
300: Use
301: .Ar program
302: to compress temporary files.
303: When invoked with no arguments,
304: .Ar program
305: must compress standard input to standard output.
306: When called with the
307: .Fl d
308: option, it must decompress standard input to standard output.
309: If
310: .Ar program
311: fails,
312: .Nm
313: will exit with an error.
1.37 jmc 314: The
1.41 millert 315: .Xr compress 1
316: and
317: .Xr gzip 1
318: utilities meet these requirements.
319: .It Fl Fl debug
320: Print some extra information about the sorting process to the
321: standard output.
322: .It Fl Fl files0-from Ns = Ns Ar filename
323: Take the input file list from the file
1.44 jmc 324: .Ar filename .
1.41 millert 325: The file names must be separated by NUL
326: (like the output produced by the command
327: .Dq find ... -print0 ) .
1.49 jmc 328: .It Fl Fl heapsort
329: Try to use heap sort, if the sort specifications allow.
330: This sort algorithm cannot be used with
331: .Fl u
332: and
333: .Fl s .
334: .It Fl Fl help
335: Print the help text and exit.
1.58 anton 336: .It Fl H , Fl Fl mergesort
1.41 millert 337: Use mergesort.
338: This is a universal algorithm that can always be used,
339: but it is not always the fastest.
1.49 jmc 340: .It Fl Fl mmap
341: Try to use file memory mapping system call.
342: It may increase speed in some cases.
1.41 millert 343: .It Fl Fl qsort
344: Try to use quick sort, if the sort specifications allow.
345: This sort algorithm cannot be used with
346: .Fl u
347: and
348: .Fl s .
1.49 jmc 349: .It Fl Fl radixsort
350: Try to use radix sort, if the sort specifications allow.
351: The radix sort can only be used for trivial locales (C and POSIX),
352: and it cannot be used for numeric or month sort.
353: Radix sort is very fast and stable.
354: .It Fl Fl random-source Ns = Ns Ar filename
355: For random sort, the contents of
356: .Ar filename
357: are used as the source of the
358: .Sq seed
359: data for the hash function.
360: Two invocations of random sort with the same seed data will use
361: produce the same result if the input is also identical.
362: By default, the
363: .Xr arc4random_buf 3
364: function is used instead.
365: .It Fl Fl version
366: Print the version and exit.
1.3 aaron 367: .El
1.1 millert 368: .Pp
1.12 aaron 369: A field is defined as a maximal sequence of characters other than the
1.6 pjanzen 370: field separator and record separator
371: .Pq newline by default .
372: Initial blank spaces are included in the field unless
373: .Fl b
374: has been specified;
375: the first blank space of a sequence of blank spaces acts as the field
376: separator and is included in the field (unless
377: .Fl t
378: is specified).
379: For example, by default all blank spaces at the beginning of a line are
380: considered to be part of the first field.
1.1 millert 381: .Pp
1.12 aaron 382: Fields are specified by the
1.45 jmc 383: .Fl k Ar field1 Ns Op , Ns Ar field2
1.41 millert 384: option.
385: If
1.1 millert 386: .Ar field2
1.41 millert 387: is missing, the end of the key defaults to the end of the line.
1.1 millert 388: .Pp
389: The arguments
390: .Ar field1
391: and
392: .Ar field2
393: have the form
394: .Em m.n
1.6 pjanzen 395: .Em (m,n > 0)
1.41 millert 396: and can be followed by one or more of the modifiers
1.6 pjanzen 397: .Cm b , d , f , i ,
1.41 millert 398: .Cm n , g , M
1.6 pjanzen 399: and
400: .Cm r ,
401: which correspond to the options discussed above.
1.41 millert 402: When
403: .Cm b
404: is specified it applies only to
405: .Ar field1
406: or
407: .Ar field2
408: where it is specified while the rest of the modifiers
409: apply to the whole key field regardless if they are
410: specified only with
411: .Ar field1
412: or
413: .Ar field2
414: or both.
1.1 millert 415: A
416: .Ar field1
417: position specified by
418: .Em m.n
419: is interpreted as the
420: .Em n Ns th
1.6 pjanzen 421: character from the beginning of the
1.1 millert 422: .Em m Ns th
423: field.
424: A missing
425: .Em \&.n
426: in
427: .Ar field1
428: means
429: .Ql \&.1 ,
430: indicating the first character of the
431: .Em m Ns th
1.12 aaron 432: field; if the
1.1 millert 433: .Fl b
434: option is in effect,
435: .Em n
1.12 aaron 436: is counted from the first non-blank character in the
1.1 millert 437: .Em m Ns th
438: field;
439: .Em m Ns \&.1b
1.12 aaron 440: refers to the first non-blank character in the
1.1 millert 441: .Em m Ns th
442: field.
1.6 pjanzen 443: .No 1\&. Ns Em n
444: refers to the
445: .Em n Ns th
446: character from the beginning of the line;
447: if
448: .Em n
449: is greater than the length of the line, the field is taken to be empty.
1.1 millert 450: .Pp
1.41 millert 451: .Em n Ns th
452: positions are always counted from the field beginning, even if the field
453: is shorter than the number of specified positions.
454: Thus, the key can really start from a position in a subsequent field.
455: .Pp
1.1 millert 456: A
457: .Ar field2
458: position specified by
459: .Em m.n
1.12 aaron 460: is interpreted as the
1.1 millert 461: .Em n Ns th
1.41 millert 462: character (including separators) from the beginning of the
1.1 millert 463: .Em m Ns th
464: field.
465: A missing
466: .Em \&.n
1.5 aaron 467: indicates the last character of the
1.1 millert 468: .Em m Ns th
469: field;
1.5 aaron 470: .Em m
1.1 millert 471: = \&0
472: designates the end of a line.
473: Thus the option
474: .Fl k Ar v.x,w.y
1.41 millert 475: is synonymous with the obsolete option
1.1 millert 476: .Cm \(pl Ns Ar v-\&1.x-\&1
477: .Fl Ns Ar w-\&1.y ;
478: when
479: .Em y
480: is omitted,
481: .Fl k Ar v.x,w
482: is synonymous with
1.5 aaron 483: .Cm \(pl Ns Ar v-\&1.x-\&1
1.19 tdeval 484: .Fl Ns Ar w\&.0 .
1.41 millert 485: The obsolete
1.1 millert 486: .Cm \(pl Ns Ar pos1
487: .Fl Ns Ar pos2
488: option is still supported, except for
1.3 aaron 489: .Fl Ns Ar w\&.0b ,
1.1 millert 490: which has no
491: .Fl k
492: equivalent.
493: .Sh ENVIRONMENT
494: .Bl -tag -width Fl
495: .It Ev TMPDIR
1.41 millert 496: Path to the directory in which temporary files will be stored.
1.3 aaron 497: Note that
1.1 millert 498: .Ev TMPDIR
499: may be overridden by the
500: .Fl T
501: option.
1.11 aaron 502: .El
1.1 millert 503: .Sh FILES
504: .Bl -tag -width Pa -compact
1.56 lteo 505: .It Pa /tmp/.bsdsort.PID.*
1.41 millert 506: Temporary files.
1.39 jmc 507: .El
508: .Sh EXIT STATUS
509: The
510: .Nm
511: utility exits with one of the following values:
512: .Pp
513: .Bl -tag -width Ds -offset indent -compact
514: .It 0
1.41 millert 515: Successfully sorted the input files or if used with
516: .Fl C
517: or
518: .Fl c ,
519: the input file already met the sorting criteria.
1.39 jmc 520: .It 1
1.41 millert 521: On disorder (or non-uniqueness) with the
1.39 jmc 522: .Fl C
523: or
524: .Fl c
1.41 millert 525: options.
1.39 jmc 526: .It 2
527: An error occurred.
1.1 millert 528: .El
529: .Sh SEE ALSO
530: .Xr comm 1 ,
1.3 aaron 531: .Xr join 1 ,
1.47 jmc 532: .Xr uniq 1
1.27 dlg 533: .Sh STANDARDS
534: The
535: .Nm
1.28 jmc 536: utility is compliant with the
1.33 jmc 537: .St -p1003.1-2008
1.60 ! schwarze 538: specification, except that it ignores the user's
! 539: .Xr locale 1
! 540: and always assumes
! 541: .Ev LC_ALL Ns =C.
1.27 dlg 542: .Pp
543: The flags
1.43 jmc 544: .Op Fl gHhiMRSsTVz
1.28 jmc 545: are extensions to that specification.
1.41 millert 546: .Pp
547: All long options are extensions to the specification.
548: Some are provided for compatibility with GNU
549: .Nm ,
550: others are specific to this implementation.
1.54 millert 551: .Pp
552: Some implementations of
553: .Nm
554: honor the
555: .Fl b
556: option even when no key fields are specified.
557: This implementation follows historic practice and
558: .St -p1003.1-2008
559: in only honoring
560: .Fl b
561: when it precedes a key field.
1.52 millert 562: .Pp
563: The historic practice of allowing the
564: .Fl o
565: option to appear after the
566: .Ar file
567: is supported for compatibility with older versions of
568: .Nm .
1.41 millert 569: .Pp
570: The historic key notations
571: .Cm \(pl Ns Ar pos1
572: and
573: .Fl Ns Ar pos2
574: are supported for compatibility with older versions of
575: .Nm
576: but their use is highly discouraged.
1.1 millert 577: .Sh HISTORY
578: A
1.8 aaron 579: .Nm
1.1 millert 580: command appeared in
1.16 mickey 581: .At v3 .
1.41 millert 582: .Sh AUTHORS
1.44 jmc 583: .An Gabor Kovesdan Aq Mt gabor@FreeBSD.org
584: .An Oleg Moskalenko Aq Mt mom040267@gmail.com
1.45 jmc 585: .Sh CAVEATS
1.41 millert 586: This implementation of
1.14 ericj 587: .Nm
588: has no limits on input line length (other than imposed by available
589: memory) or any restrictions on bytes allowed within lines.
590: .Pp
1.60 ! schwarze 591: The performance depends highly on
1.41 millert 592: efficient choice of sort keys and key complexity.
1.60 ! schwarze 593: The fastest sort is on whole lines, with option
1.41 millert 594: .Fl s .
595: For the key specification, the simpler to process the
596: lines the faster the search will be.
1.14 ericj 597: .Pp
1.41 millert 598: When sorting by arithmetic value, using
599: .Fl n
600: results in much better performance than
601: .Fl g
602: so its use is encouraged whenever possible.