Annotation of src/usr.bin/sort/sort.1, Revision 1.64
1.64 ! schwarze 1: .\" $OpenBSD: sort.1,v 1.63 2020/01/16 16:46:47 schwarze Exp $
1.1 millert 2: .\"
3: .\" Copyright (c) 1991, 1993
4: .\" The Regents of the University of California. All rights reserved.
5: .\"
6: .\" This code is derived from software contributed to Berkeley by
7: .\" the Institute of Electrical and Electronics Engineers, Inc.
8: .\"
9: .\" Redistribution and use in source and binary forms, with or without
10: .\" modification, are permitted provided that the following conditions
11: .\" are met:
12: .\" 1. Redistributions of source code must retain the above copyright
13: .\" notice, this list of conditions and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
1.20 millert 17: .\" 3. Neither the name of the University nor the names of its contributors
1.1 millert 18: .\" may be used to endorse or promote products derived from this software
19: .\" without specific prior written permission.
20: .\"
21: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
22: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
24: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
25: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
27: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
28: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
29: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
30: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31: .\" SUCH DAMAGE.
32: .\"
33: .\" @(#)sort.1 8.1 (Berkeley) 6/6/93
34: .\"
1.64 ! schwarze 35: .Dd $Mdocdate: January 16 2020 $
1.1 millert 36: .Dt SORT 1
37: .Os
38: .Sh NAME
39: .Nm sort
1.41 millert 40: .Nd sort, merge, or sequence check text and binary files
1.1 millert 41: .Sh SYNOPSIS
42: .Nm sort
1.43 jmc 43: .Op Fl bCcdfgHhiMmnRrsuVz
1.42 jmc 44: .Op Fl k Ar field1 Ns Op , Ns Ar field2
1.23 jmc 45: .Op Fl o Ar output
1.42 jmc 46: .Op Fl S Ar size
1.1 millert 47: .Op Fl T Ar dir
1.23 jmc 48: .Op Fl t Ar char
1.34 sobrado 49: .Op Ar
1.1 millert 50: .Sh DESCRIPTION
51: The
1.8 aaron 52: .Nm
1.41 millert 53: utility sorts text and binary files by lines.
54: A line is a record separated from the subsequent record by a
1.61 bentley 55: newline (default) or NUL
56: .Ql \e0
57: character
1.55 schwarze 58: .Po
59: .Fl z
60: option
61: .Pc .
1.41 millert 62: A record can contain any printable or unprintable characters.
63: Comparisons are based on one or more sort keys extracted from
64: each line of input, and are performed lexicographically,
1.60 schwarze 65: according to the specified command-line options
66: that can tune the actual sorting behavior.
1.8 aaron 67: By default, if keys are not given,
68: .Nm
1.41 millert 69: uses entire lines for comparison.
1.1 millert 70: .Pp
1.49 jmc 71: If no
72: .Ar file
73: is specified, or if
74: .Ar file
75: is
76: .Sq - ,
77: the standard input is used.
78: .Pp
1.7 aaron 79: The options are as follows:
1.21 jmc 80: .Bl -tag -width Ds
1.57 schwarze 81: .It Fl C , Fl Fl check Ns = Ns Cm silent Ns | Ns Cm quiet
1.35 schwarze 82: Check that the single input file is sorted.
83: If it is, exit 0; if it's not, exit 1.
84: In either case, produce no output.
1.57 schwarze 85: .It Fl c , Fl Fl check
1.35 schwarze 86: Like
87: .Fl C ,
1.37 jmc 88: but additionally write a message to
1.35 schwarze 89: .Em stderr
90: if the input file is not sorted.
1.41 millert 91: .It Fl m , Fl Fl merge
1.1 millert 92: Merge only; the input files are assumed to be pre-sorted.
1.41 millert 93: If they are not sorted, the output order is undefined.
94: .It Fl o Ar output , Fl Fl output Ns = Ns Ar output
95: Write the output to the
1.1 millert 96: .Ar output
1.41 millert 97: file instead of the standard output.
1.12 aaron 98: This file can be the same as one of the input files.
1.42 jmc 99: .It Fl S Ar size , Fl Fl buffer-size Ns = Ns Ar size
1.41 millert 100: Use a memory buffer no larger than
101: .Ar size .
102: The modifiers %, b, K, M, G, T, P, E, Z, and Y can be used.
103: If no memory limit is specified,
104: .Nm
105: may use up to about 90% of available memory.
106: If the input is too big to fit into the memory buffer,
107: temporary files are used.
1.42 jmc 108: .It Fl s
109: Stable sort; maintains the original record order of records that have
1.50 jmc 110: an equal key.
1.42 jmc 111: This is a non-standard feature, but it is widely accepted and used.
1.41 millert 112: .It Fl T Ar dir , Fl Fl temporary-directory Ns = Ns Ar dir
113: Store temporary files in the directory
114: .Ar dir .
115: The default path is the value of the environment variable
1.1 millert 116: .Ev TMPDIR
117: or
1.56 lteo 118: .Pa /tmp
1.1 millert 119: if
120: .Ev TMPDIR
1.41 millert 121: is not defined.
122: .It Fl u , Fl Fl unique
1.12 aaron 123: Unique: suppress all but one in each set of lines having equal keys.
1.41 millert 124: This option implies a stable sort (see below).
125: If used with
1.35 schwarze 126: .Fl C
127: or
1.41 millert 128: .Fl c ,
129: .Nm
130: also checks that there are no lines with duplicate keys.
1.38 jmc 131: .El
132: .Pp
1.1 millert 133: The following options override the default ordering rules.
1.37 jmc 134: If ordering options appear before the first
135: .Fl k
136: option, they apply globally to all sort keys.
1.1 millert 137: When attached to a specific key (see
138: .Fl k ) ,
1.41 millert 139: the ordering options override all global ordering options for that key.
1.37 jmc 140: Note that the ordering options intended to apply globally should not
141: appear after
142: .Fl k
143: or results may be unexpected.
1.1 millert 144: .Bl -tag -width indent
1.41 millert 145: .It Fl d , Fl Fl dictionary-order
146: Consider only blank spaces and alphanumeric characters in comparisons.
147: .It Fl f , Fl Fl ignore-case
148: Consider all lowercase characters that have uppercase
1.12 aaron 149: equivalents to be the same for purposes of comparison.
1.57 schwarze 150: .It Fl g , Fl Fl general-numeric-sort , Fl Fl sort Ns = Ns Cm general-numeric
1.41 millert 151: Sort by general numerical value.
152: As opposed to
153: .Fl n ,
1.50 jmc 154: this option handles general floating points.
155: It has a more
156: permissive format than that allowed by
157: .Fl n
1.41 millert 158: but it has a significant performance drawback.
1.57 schwarze 159: .It Fl h , Fl Fl human-numeric-sort , Fl Fl sort Ns = Ns Cm human-numeric
1.41 millert 160: Sort by numerical value, but take into account the SI suffix,
161: if present.
162: Sorts first by numeric sign (negative, zero, or
163: positive); then by SI suffix (either empty, or `k' or `K', or one
164: of `MGTPEZY', in that order); and finally by numeric value.
165: The SI suffix must immediately follow the number.
166: For example, '12345K' sorts before '1M', because M is "larger" than K.
167: This sort option is useful for sorting the output of a single invocation
168: of 'df' command with
169: .Fl h
170: or
171: .Fl H
172: options (human-readable).
173: .It Fl i , Fl Fl ignore-nonprinting
1.1 millert 174: Ignore all non-printable characters.
1.57 schwarze 175: .It Fl M , Fl Fl month-sort , Fl Fl sort Ns = Ns Cm month
1.41 millert 176: Sort by month abbreviations.
177: Unknown strings are considered smaller than valid month names.
1.57 schwarze 178: .It Fl n , Fl Fl numeric-sort , Fl Fl sort Ns = Ns Cm numeric
1.12 aaron 179: An initial numeric string, consisting of optional blank space, optional
180: minus sign, and zero or more digits (including decimal point)
1.1 millert 181: is sorted by arithmetic value.
1.41 millert 182: Leading blank characters are ignored.
1.57 schwarze 183: .It Fl R , Fl Fl random-sort , Fl Fl sort Ns = Ns Cm random
1.41 millert 184: Sort lines in random order.
185: This is a random permutation of the inputs with the exception that
186: equal keys sort together.
187: It is implemented by hashing the input keys and sorting the hash values.
188: The hash function is randomized with data from
1.47 jmc 189: .Xr arc4random_buf 3 ,
1.41 millert 190: or by file content if one is specified via
191: .Fl Fl random-source .
192: If multiple sort fields are specified,
193: the same random hash function is used for all of them.
194: .It Fl r , Fl Fl reverse
195: Sort in reverse order.
1.57 schwarze 196: .It Fl V , Fl Fl version-sort
1.41 millert 197: Sort version numbers.
198: The input lines are treated as file names in form
199: PREFIX VERSION SUFFIX, where SUFFIX matches the regular expression
200: "(\.([A-Za-z~][A-Za-z0-9~]*)?)*".
201: The files are compared by their prefixes and versions (leading
202: zeros are ignored in version numbers, see example below).
203: If an input string does not match the pattern, then it is compared
204: using the byte compare function.
1.44 jmc 205: .Pp
206: For example:
207: .Bd -literal -offset indent
208: $ ls sort* | sort -V
209: sort-1.022.tgz
210: sort-1.23.tgz
211: sort-1.23.1.tgz
212: sort-1.024.tgz
213: sort-1.024.003.
214: sort-1.024.003.tgz
215: sort-1.024.07.tgz
216: sort-1.024.009.tgz
217: .Ed
1.1 millert 218: .El
219: .Pp
1.12 aaron 220: The treatment of field separators can be altered using these options:
1.1 millert 221: .Bl -tag -width indent
1.41 millert 222: .It Fl b , Fl Fl ignore-leading-blanks
223: Ignore leading blank space when determining the start
224: and end of a restricted sort key (see
225: .Fl k ) .
226: If
1.1 millert 227: .Fl b
1.41 millert 228: is specified before the first
1.1 millert 229: .Fl k
1.41 millert 230: option, it applies globally to all key specifications.
231: Otherwise,
1.1 millert 232: .Fl b
1.41 millert 233: can be attached independently to each
1.1 millert 234: .Ar field
1.41 millert 235: argument of the key specifications.
1.53 millert 236: Note that
237: .Fl b
238: should not appear after
239: .Fl k ,
240: and that it has no effect unless key fields are specified.
1.41 millert 241: .It Xo
1.42 jmc 242: .Fl k Ar field1 Ns Op , Ns Ar field2 ,
243: .Fl Fl key Ns = Ns Ar field1 Ns Op , Ns Ar field2
1.41 millert 244: .Xc
245: Define a restricted sort key that has the starting position
246: .Ar field1 ,
247: and optional ending position
248: .Ar field2
249: of a key field.
250: The
251: .Fl k
252: option may be specified multiple times,
253: in which case subsequent keys are compared after earlier keys compare equal.
254: The
1.1 millert 255: .Fl k
1.41 millert 256: option replaces the obsolete options
257: .Cm \(pl Ns Ar pos1
258: and
259: .Fl Ns Ar pos2 ,
260: but the old notation is also supported.
261: .It Fl t Ar char , Fl Fl field-separator Ns = Ns Ar char
262: Use
1.3 aaron 263: .Ar char
1.41 millert 264: as the field separator character.
1.8 aaron 265: The initial
1.1 millert 266: .Ar char
1.12 aaron 267: is not considered to be part of a field when determining key offsets.
1.1 millert 268: Each occurrence of
269: .Ar char
270: is significant (for example,
271: .Dq Ar charchar
272: delimits an empty field).
273: If
274: .Fl t
1.6 pjanzen 275: is not specified, the default field separator is a sequence of
276: blank-space characters, and consecutive blank spaces do
277: .Em not
278: delimit an empty field; further, the initial blank space
279: .Em is
280: considered part of a field when determining key offsets.
1.41 millert 281: To use NUL as field separator, use
282: .Fl t
1.61 bentley 283: \(aq\e0\(aq.
1.41 millert 284: .It Fl z , Fl Fl zero-terminated
285: Use NUL as the record separator.
286: By default, records in the files are expected to be separated by
287: the newline characters.
1.61 bentley 288: With this option, NUL
289: .Pq Ql \e0
290: is used as the record separator character.
1.37 jmc 291: .El
292: .Pp
1.41 millert 293: Other options:
1.37 jmc 294: .Bl -tag -width indent
1.41 millert 295: .It Fl Fl batch-size Ns = Ns Ar num
296: Specify maximum number of files that can be opened by
297: .Nm
298: at once.
299: This option affects behavior when having many input files or using
300: temporary files.
1.51 millert 301: The minimum value is 2.
1.41 millert 302: The default value is 16.
303: .It Fl Fl compress-program Ns = Ns Ar program
304: Use
305: .Ar program
306: to compress temporary files.
307: When invoked with no arguments,
308: .Ar program
309: must compress standard input to standard output.
310: When called with the
311: .Fl d
312: option, it must decompress standard input to standard output.
313: If
314: .Ar program
315: fails,
316: .Nm
317: will exit with an error.
1.37 jmc 318: The
1.41 millert 319: .Xr compress 1
320: and
321: .Xr gzip 1
322: utilities meet these requirements.
323: .It Fl Fl debug
324: Print some extra information about the sorting process to the
325: standard output.
326: .It Fl Fl files0-from Ns = Ns Ar filename
327: Take the input file list from the file
1.44 jmc 328: .Ar filename .
1.41 millert 329: The file names must be separated by NUL
330: (like the output produced by the command
331: .Dq find ... -print0 ) .
1.49 jmc 332: .It Fl Fl heapsort
333: Try to use heap sort, if the sort specifications allow.
334: This sort algorithm cannot be used with
335: .Fl u
336: and
337: .Fl s .
338: .It Fl Fl help
339: Print the help text and exit.
1.58 anton 340: .It Fl H , Fl Fl mergesort
1.41 millert 341: Use mergesort.
342: This is a universal algorithm that can always be used,
343: but it is not always the fastest.
1.49 jmc 344: .It Fl Fl mmap
345: Try to use file memory mapping system call.
346: It may increase speed in some cases.
1.41 millert 347: .It Fl Fl qsort
348: Try to use quick sort, if the sort specifications allow.
349: This sort algorithm cannot be used with
350: .Fl u
351: and
352: .Fl s .
1.49 jmc 353: .It Fl Fl radixsort
354: Try to use radix sort, if the sort specifications allow.
355: The radix sort can only be used for trivial locales (C and POSIX),
356: and it cannot be used for numeric or month sort.
357: Radix sort is very fast and stable.
358: .It Fl Fl random-source Ns = Ns Ar filename
359: For random sort, the contents of
360: .Ar filename
361: are used as the source of the
362: .Sq seed
363: data for the hash function.
1.64 ! schwarze 364: Two invocations of random sort with the same seed data
1.49 jmc 365: produce the same result if the input is also identical.
366: By default, the
367: .Xr arc4random_buf 3
368: function is used instead.
369: .It Fl Fl version
370: Print the version and exit.
1.3 aaron 371: .El
1.1 millert 372: .Pp
1.12 aaron 373: A field is defined as a maximal sequence of characters other than the
1.6 pjanzen 374: field separator and record separator
375: .Pq newline by default .
376: Initial blank spaces are included in the field unless
377: .Fl b
378: has been specified;
379: the first blank space of a sequence of blank spaces acts as the field
380: separator and is included in the field (unless
381: .Fl t
382: is specified).
383: For example, by default all blank spaces at the beginning of a line are
384: considered to be part of the first field.
1.1 millert 385: .Pp
1.12 aaron 386: Fields are specified by the
1.45 jmc 387: .Fl k Ar field1 Ns Op , Ns Ar field2
1.41 millert 388: option.
389: If
1.1 millert 390: .Ar field2
1.41 millert 391: is missing, the end of the key defaults to the end of the line.
1.1 millert 392: .Pp
393: The arguments
394: .Ar field1
395: and
396: .Ar field2
397: have the form
398: .Em m.n
1.6 pjanzen 399: .Em (m,n > 0)
1.41 millert 400: and can be followed by one or more of the modifiers
1.6 pjanzen 401: .Cm b , d , f , i ,
1.41 millert 402: .Cm n , g , M
1.6 pjanzen 403: and
404: .Cm r ,
405: which correspond to the options discussed above.
1.41 millert 406: When
407: .Cm b
408: is specified it applies only to
409: .Ar field1
410: or
411: .Ar field2
412: where it is specified while the rest of the modifiers
413: apply to the whole key field regardless if they are
414: specified only with
415: .Ar field1
416: or
417: .Ar field2
418: or both.
1.1 millert 419: A
420: .Ar field1
421: position specified by
422: .Em m.n
423: is interpreted as the
424: .Em n Ns th
1.6 pjanzen 425: character from the beginning of the
1.1 millert 426: .Em m Ns th
427: field.
428: A missing
429: .Em \&.n
430: in
431: .Ar field1
432: means
433: .Ql \&.1 ,
434: indicating the first character of the
435: .Em m Ns th
1.12 aaron 436: field; if the
1.1 millert 437: .Fl b
438: option is in effect,
439: .Em n
1.12 aaron 440: is counted from the first non-blank character in the
1.1 millert 441: .Em m Ns th
442: field;
443: .Em m Ns \&.1b
1.12 aaron 444: refers to the first non-blank character in the
1.1 millert 445: .Em m Ns th
446: field.
1.6 pjanzen 447: .No 1\&. Ns Em n
448: refers to the
449: .Em n Ns th
450: character from the beginning of the line;
451: if
452: .Em n
453: is greater than the length of the line, the field is taken to be empty.
1.1 millert 454: .Pp
1.41 millert 455: .Em n Ns th
456: positions are always counted from the field beginning, even if the field
457: is shorter than the number of specified positions.
458: Thus, the key can really start from a position in a subsequent field.
459: .Pp
1.1 millert 460: A
461: .Ar field2
462: position specified by
463: .Em m.n
1.12 aaron 464: is interpreted as the
1.1 millert 465: .Em n Ns th
1.41 millert 466: character (including separators) from the beginning of the
1.1 millert 467: .Em m Ns th
468: field.
469: A missing
470: .Em \&.n
1.5 aaron 471: indicates the last character of the
1.1 millert 472: .Em m Ns th
473: field;
1.5 aaron 474: .Em m
1.1 millert 475: = \&0
476: designates the end of a line.
477: Thus the option
478: .Fl k Ar v.x,w.y
1.41 millert 479: is synonymous with the obsolete option
1.1 millert 480: .Cm \(pl Ns Ar v-\&1.x-\&1
481: .Fl Ns Ar w-\&1.y ;
482: when
483: .Em y
484: is omitted,
485: .Fl k Ar v.x,w
486: is synonymous with
1.5 aaron 487: .Cm \(pl Ns Ar v-\&1.x-\&1
1.19 tdeval 488: .Fl Ns Ar w\&.0 .
1.41 millert 489: The obsolete
1.1 millert 490: .Cm \(pl Ns Ar pos1
491: .Fl Ns Ar pos2
492: option is still supported, except for
1.3 aaron 493: .Fl Ns Ar w\&.0b ,
1.1 millert 494: which has no
495: .Fl k
496: equivalent.
497: .Sh ENVIRONMENT
1.63 schwarze 498: .Bl -tag -width Ds
1.1 millert 499: .It Ev TMPDIR
1.41 millert 500: Path to the directory in which temporary files will be stored.
1.3 aaron 501: Note that
1.1 millert 502: .Ev TMPDIR
503: may be overridden by the
504: .Fl T
505: option.
1.11 aaron 506: .El
1.1 millert 507: .Sh FILES
508: .Bl -tag -width Pa -compact
1.56 lteo 509: .It Pa /tmp/.bsdsort.PID.*
1.41 millert 510: Temporary files.
1.39 jmc 511: .El
512: .Sh EXIT STATUS
513: The
514: .Nm
515: utility exits with one of the following values:
516: .Pp
517: .Bl -tag -width Ds -offset indent -compact
518: .It 0
1.41 millert 519: Successfully sorted the input files or if used with
520: .Fl C
521: or
522: .Fl c ,
523: the input file already met the sorting criteria.
1.39 jmc 524: .It 1
1.41 millert 525: On disorder (or non-uniqueness) with the
1.39 jmc 526: .Fl C
527: or
528: .Fl c
1.41 millert 529: options.
1.39 jmc 530: .It 2
531: An error occurred.
1.1 millert 532: .El
533: .Sh SEE ALSO
534: .Xr comm 1 ,
1.3 aaron 535: .Xr join 1 ,
1.47 jmc 536: .Xr uniq 1
1.27 dlg 537: .Sh STANDARDS
538: The
539: .Nm
1.28 jmc 540: utility is compliant with the
1.33 jmc 541: .St -p1003.1-2008
1.60 schwarze 542: specification, except that it ignores the user's
543: .Xr locale 1
544: and always assumes
545: .Ev LC_ALL Ns =C.
1.27 dlg 546: .Pp
547: The flags
1.43 jmc 548: .Op Fl gHhiMRSsTVz
1.28 jmc 549: are extensions to that specification.
1.41 millert 550: .Pp
551: All long options are extensions to the specification.
552: Some are provided for compatibility with GNU
553: .Nm ,
554: others are specific to this implementation.
1.54 millert 555: .Pp
556: Some implementations of
557: .Nm
558: honor the
559: .Fl b
560: option even when no key fields are specified.
561: This implementation follows historic practice and
562: .St -p1003.1-2008
563: in only honoring
564: .Fl b
565: when it precedes a key field.
1.52 millert 566: .Pp
567: The historic practice of allowing the
568: .Fl o
569: option to appear after the
570: .Ar file
571: is supported for compatibility with older versions of
572: .Nm .
1.41 millert 573: .Pp
574: The historic key notations
575: .Cm \(pl Ns Ar pos1
576: and
577: .Fl Ns Ar pos2
578: are supported for compatibility with older versions of
579: .Nm
580: but their use is highly discouraged.
1.1 millert 581: .Sh HISTORY
582: A
1.8 aaron 583: .Nm
1.1 millert 584: command appeared in
1.62 schwarze 585: .At v1 .
1.41 millert 586: .Sh AUTHORS
1.44 jmc 587: .An Gabor Kovesdan Aq Mt gabor@FreeBSD.org
588: .An Oleg Moskalenko Aq Mt mom040267@gmail.com
1.45 jmc 589: .Sh CAVEATS
1.41 millert 590: This implementation of
1.14 ericj 591: .Nm
592: has no limits on input line length (other than imposed by available
593: memory) or any restrictions on bytes allowed within lines.
594: .Pp
1.60 schwarze 595: The performance depends highly on
1.41 millert 596: efficient choice of sort keys and key complexity.
1.60 schwarze 597: The fastest sort is on whole lines, with option
1.41 millert 598: .Fl s .
599: For the key specification, the simpler to process the
600: lines the faster the search will be.
1.14 ericj 601: .Pp
1.41 millert 602: When sorting by arithmetic value, using
603: .Fl n
604: results in much better performance than
605: .Fl g
606: so its use is encouraged whenever possible.