[BACK]Return to sort.1 CVS log [TXT][DIR] Up to [local] / src / usr.bin / sort

Annotation of src/usr.bin/sort/sort.1, Revision 1.39

1.39    ! jmc         1: .\"    $OpenBSD: sort.1,v 1.38 2010/06/28 15:28:52 jmc Exp $
1.1       millert     2: .\"
                      3: .\" Copyright (c) 1991, 1993
                      4: .\"    The Regents of the University of California.  All rights reserved.
                      5: .\"
                      6: .\" This code is derived from software contributed to Berkeley by
                      7: .\" the Institute of Electrical and Electronics Engineers, Inc.
                      8: .\"
                      9: .\" Redistribution and use in source and binary forms, with or without
                     10: .\" modification, are permitted provided that the following conditions
                     11: .\" are met:
                     12: .\" 1. Redistributions of source code must retain the above copyright
                     13: .\"    notice, this list of conditions and the following disclaimer.
                     14: .\" 2. Redistributions in binary form must reproduce the above copyright
                     15: .\"    notice, this list of conditions and the following disclaimer in the
                     16: .\"    documentation and/or other materials provided with the distribution.
1.20      millert    17: .\" 3. Neither the name of the University nor the names of its contributors
1.1       millert    18: .\"    may be used to endorse or promote products derived from this software
                     19: .\"    without specific prior written permission.
                     20: .\"
                     21: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
                     22: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
                     23: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
                     24: .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
                     25: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
                     26: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
                     27: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
                     28: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
                     29: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
                     30: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
                     31: .\" SUCH DAMAGE.
                     32: .\"
                     33: .\"     @(#)sort.1     8.1 (Berkeley) 6/6/93
                     34: .\"
1.39    ! jmc        35: .Dd $Mdocdate: June 28 2010 $
1.1       millert    36: .Dt SORT 1
                     37: .Os
                     38: .Sh NAME
                     39: .Nm sort
1.37      jmc        40: .Nd sort, merge, or sequence check text files
1.1       millert    41: .Sh SYNOPSIS
                     42: .Nm sort
1.35      schwarze   43: .Op Fl bCcdfHimnrsuz
1.23      jmc        44: .Sm off
1.24      jmc        45: .Op Fl k\ \& Ar field1 Op , Ar field2
1.23      jmc        46: .Sm on
                     47: .Op Fl o Ar output
                     48: .Op Fl R Ar char
                     49: .Bk -words
1.1       millert    50: .Op Fl T Ar dir
1.23      jmc        51: .Ek
                     52: .Op Fl t Ar char
1.34      sobrado    53: .Op Ar
1.1       millert    54: .Sh DESCRIPTION
                     55: The
1.8       aaron      56: .Nm
1.37      jmc        57: utility sorts text files by lines,
                     58: operating in one of three modes: sort, merge, or check.
                     59: In sort mode, the specified files are combined and sorted
                     60: by line.
                     61: Merge mode is the same as sort mode except that the input
                     62: files are assumed to be pre-sorted.
                     63: In check mode, a single input file is checked to ensure that
                     64: it is correctly sorted.
                     65: .Pp
1.1       millert    66: Comparisons are based on one or more sort keys extracted
1.8       aaron      67: from each line of input, and are performed lexicographically.
                     68: By default, if keys are not given,
                     69: .Nm
1.1       millert    70: regards each input line as a single field.
                     71: .Pp
1.7       aaron      72: The options are as follows:
1.21      jmc        73: .Bl -tag -width Ds
1.35      schwarze   74: .It Fl C
                     75: Check that the single input file is sorted.
                     76: If it is, exit 0; if it's not, exit 1.
                     77: In either case, produce no output.
1.1       millert    78: .It Fl c
1.35      schwarze   79: Like
                     80: .Fl C ,
1.37      jmc        81: but additionally write a message to
1.35      schwarze   82: .Em stderr
                     83: if the input file is not sorted.
1.1       millert    84: .It Fl m
                     85: Merge only; the input files are assumed to be pre-sorted.
1.37      jmc        86: This option is overridden by the
                     87: .Fl C
                     88: or
                     89: .Fl c
                     90: options,
                     91: if they are also present.
1.1       millert    92: .It Fl o Ar output
                     93: The argument given is the name of an
                     94: .Ar output
1.12      aaron      95: file to be used instead of the standard output.
                     96: This file can be the same as one of the input files.
1.1       millert    97: .It Fl T Ar dir
                     98: Use
                     99: .Ar dir
1.8       aaron     100: as the directory for temporary files.
                    101: The default is the contents of the environment variable
1.1       millert   102: .Ev TMPDIR
                    103: or
                    104: .Pa /var/tmp
                    105: if
                    106: .Ev TMPDIR
                    107: does not exist.
                    108: .It Fl u
1.12      aaron     109: Unique: suppress all but one in each set of lines having equal keys.
1.1       millert   110: If used with the
1.35      schwarze  111: .Fl C
                    112: or
1.1       millert   113: .Fl c
1.35      schwarze  114: options, also check that there are no lines with duplicate keys.
1.1       millert   115: .El
                    116: .Pp
1.38      jmc       117: The following options override the default ordering rules globally:
                    118: .Bl -tag -width indent
                    119: .It Fl H
                    120: Use a merge sort instead of a radix sort.
                    121: This option should be used for files larger than 60Mb.
                    122: .It Fl s
                    123: Enable stable sort.
                    124: Uses additional resources (see
                    125: .Xr sradixsort 3 ) .
                    126: .El
                    127: .Pp
1.1       millert   128: The following options override the default ordering rules.
1.37      jmc       129: If ordering options appear before the first
                    130: .Fl k
                    131: option, they apply globally to all sort keys.
1.1       millert   132: When attached to a specific key (see
                    133: .Fl k ) ,
                    134: the ordering options override
                    135: all global ordering options for that key.
1.37      jmc       136: Note that the ordering options intended to apply globally should not
                    137: appear after
                    138: .Fl k
                    139: or results may be unexpected.
1.1       millert   140: .Bl -tag -width indent
                    141: .It Fl d
                    142: Only blank space and alphanumeric characters
                    143: .\" according
                    144: .\" to the current setting of LC_CTYPE
1.12      aaron     145: are used in making comparisons.
1.1       millert   146: .It Fl f
                    147: Considers all lowercase characters that have uppercase
1.12      aaron     148: equivalents to be the same for purposes of comparison.
1.1       millert   149: .It Fl i
                    150: Ignore all non-printable characters.
                    151: .It Fl n
1.12      aaron     152: An initial numeric string, consisting of optional blank space, optional
                    153: minus sign, and zero or more digits (including decimal point)
1.1       millert   154: .\" with
                    155: .\" optional radix character and thousands
                    156: .\" separator
                    157: .\" (as defined in the current locale),
                    158: is sorted by arithmetic value.
                    159: (The
                    160: .Fl n
1.12      aaron     161: option no longer implies the
1.1       millert   162: .Fl b
                    163: option.)
                    164: .It Fl r
                    165: Reverse the sense of comparisons.
                    166: .El
                    167: .Pp
1.12      aaron     168: The treatment of field separators can be altered using these options:
1.1       millert   169: .Bl -tag -width indent
                    170: .It Fl b
                    171: Ignores leading blank space when determining the start
                    172: and end of a restricted sort key.
                    173: A
                    174: .Fl b
                    175: option specified before the first
                    176: .Fl k
                    177: option applies globally to all
                    178: .Fl k
                    179: options.
                    180: Otherwise, the
                    181: .Fl b
1.12      aaron     182: option can be attached independently to each
1.1       millert   183: .Ar field
                    184: argument of the
                    185: .Fl k
                    186: option (see below).
1.37      jmc       187: Note that
1.1       millert   188: .Fl b
1.37      jmc       189: should not appear after
                    190: .Fl k ,
                    191: and that it has no effect unless key fields are specified.
1.23      jmc       192: .It Fl R Ar char
                    193: .Ar char
                    194: is used as the record separator character.
                    195: This should be used with discretion;
                    196: .Fl R Aq Ar alphanumeric
                    197: usually produces undesirable results.
                    198: The default record separator is newline.
1.1       millert   199: .It Fl t Ar char
1.3       aaron     200: .Ar char
1.8       aaron     201: is used as the field separator character.
                    202: The initial
1.1       millert   203: .Ar char
1.12      aaron     204: is not considered to be part of a field when determining key offsets.
1.1       millert   205: Each occurrence of
                    206: .Ar char
                    207: is significant (for example,
                    208: .Dq Ar charchar
                    209: delimits an empty field).
                    210: If
                    211: .Fl t
1.6       pjanzen   212: is not specified, the default field separator is a sequence of
                    213: blank-space characters, and consecutive blank spaces do
                    214: .Em not
                    215: delimit an empty field; further, the initial blank space
                    216: .Em is
                    217: considered part of a field when determining key offsets.
1.22      dlg       218: .It Fl z
                    219: Uses the nul character as the record separator.
1.37      jmc       220: .El
                    221: .Pp
                    222: Sort keys are specified with:
                    223: .Bl -tag -width indent
                    224: .It Xo
                    225: .Sm off
                    226: .Fl k\ \& Ar field1 Op , Ar field2
                    227: .Sm on
                    228: .Xc
                    229: Designates the starting position,
                    230: .Ar field1 ,
                    231: and optional ending position,
                    232: .Ar field2 ,
                    233: of a key field.
                    234: The
                    235: .Fl k
                    236: option may be specified multiple times,
                    237: in which case subsequent keys are compared after earlier keys compare equal.
                    238: The
                    239: .Fl k
                    240: option replaces the obsolescent options
                    241: .Cm \(pl Ns Ar pos1
                    242: and
                    243: .Fl Ns Ar pos2 .
1.1       millert   244: .El
                    245: .Pp
                    246: The following operands are available:
                    247: .Bl -tag -width indent
1.3       aaron     248: .It Ar file
                    249: The pathname of a file to be sorted, merged, or checked.
                    250: If no
1.1       millert   251: .Ar file
1.12      aaron     252: operands are specified, or if a
1.3       aaron     253: .Ar file
                    254: operand is
1.1       millert   255: .Fl ,
                    256: the standard input is used.
1.3       aaron     257: .El
1.1       millert   258: .Pp
1.12      aaron     259: A field is defined as a maximal sequence of characters other than the
1.6       pjanzen   260: field separator and record separator
                    261: .Pq newline by default .
                    262: Initial blank spaces are included in the field unless
                    263: .Fl b
                    264: has been specified;
                    265: the first blank space of a sequence of blank spaces acts as the field
                    266: separator and is included in the field (unless
                    267: .Fl t
                    268: is specified).
                    269: For example, by default all blank spaces at the beginning of a line are
                    270: considered to be part of the first field.
1.1       millert   271: .Pp
1.12      aaron     272: Fields are specified by the
1.23      jmc       273: .Sm off
                    274: .Fl k\ \& Ar field1 Op , Ar field2
                    275: .Sm on
1.8       aaron     276: argument.
                    277: A missing
1.1       millert   278: .Ar field2
                    279: argument defaults to the end of a line.
                    280: .Pp
                    281: The arguments
                    282: .Ar field1
                    283: and
                    284: .Ar field2
                    285: have the form
                    286: .Em m.n
1.6       pjanzen   287: .Em (m,n > 0)
                    288: and can be followed by one or more of the letters
                    289: .Cm b , d , f , i ,
1.10      aaron     290: .Cm n ,
1.6       pjanzen   291: and
                    292: .Cm r ,
                    293: which correspond to the options discussed above.
1.1       millert   294: A
                    295: .Ar field1
                    296: position specified by
                    297: .Em m.n
                    298: is interpreted as the
                    299: .Em n Ns th
1.6       pjanzen   300: character from the beginning of the
1.1       millert   301: .Em m Ns th
                    302: field.
                    303: A missing
                    304: .Em \&.n
                    305: in
                    306: .Ar field1
                    307: means
                    308: .Ql \&.1 ,
                    309: indicating the first character of the
                    310: .Em m Ns th
1.12      aaron     311: field; if the
1.1       millert   312: .Fl b
                    313: option is in effect,
                    314: .Em n
1.12      aaron     315: is counted from the first non-blank character in the
1.1       millert   316: .Em m Ns th
                    317: field;
                    318: .Em m Ns \&.1b
1.12      aaron     319: refers to the first non-blank character in the
1.1       millert   320: .Em m Ns th
                    321: field.
1.6       pjanzen   322: .No 1\&. Ns Em n
                    323: refers to the
                    324: .Em n Ns th
                    325: character from the beginning of the line;
                    326: if
                    327: .Em n
                    328: is greater than the length of the line, the field is taken to be empty.
1.1       millert   329: .Pp
                    330: A
                    331: .Ar field2
                    332: position specified by
                    333: .Em m.n
1.12      aaron     334: is interpreted as the
1.1       millert   335: .Em n Ns th
                    336: character (including separators) of the
                    337: .Em m Ns th
                    338: field.
                    339: A missing
                    340: .Em \&.n
1.5       aaron     341: indicates the last character of the
1.1       millert   342: .Em m Ns th
                    343: field;
1.5       aaron     344: .Em m
1.1       millert   345: = \&0
                    346: designates the end of a line.
                    347: Thus the option
                    348: .Fl k Ar v.x,w.y
                    349: is synonymous with the obsolescent option
                    350: .Cm \(pl Ns Ar v-\&1.x-\&1
                    351: .Fl Ns Ar w-\&1.y ;
                    352: when
                    353: .Em y
                    354: is omitted,
                    355: .Fl k Ar v.x,w
                    356: is synonymous with
1.5       aaron     357: .Cm \(pl Ns Ar v-\&1.x-\&1
1.19      tdeval    358: .Fl Ns Ar w\&.0 .
1.1       millert   359: The obsolescent
                    360: .Cm \(pl Ns Ar pos1
                    361: .Fl Ns Ar pos2
                    362: option is still supported, except for
1.3       aaron     363: .Fl Ns Ar w\&.0b ,
1.1       millert   364: which has no
                    365: .Fl k
                    366: equivalent.
                    367: .Sh ENVIRONMENT
                    368: .Bl -tag -width Fl
                    369: .It Ev TMPDIR
1.3       aaron     370: Path in which to store temporary files.
                    371: Note that
1.1       millert   372: .Ev TMPDIR
                    373: may be overridden by the
                    374: .Fl T
                    375: option.
1.11      aaron     376: .El
1.1       millert   377: .Sh FILES
                    378: .Bl -tag -width Pa -compact
                    379: .It Pa /var/tmp/sort.*
1.3       aaron     380: default temporary directories
1.36      jmc       381: .It Pa output Ns #PID
1.3       aaron     382: temporary name for
1.1       millert   383: .Ar output
                    384: if
                    385: .Ar output
1.3       aaron     386: already exists
1.39    ! jmc       387: .El
        !           388: .Sh EXIT STATUS
        !           389: The
        !           390: .Nm
        !           391: utility exits with one of the following values:
        !           392: .Pp
        !           393: .Bl -tag -width Ds -offset indent -compact
        !           394: .It 0
        !           395: Normal behavior.
        !           396: .It 1
        !           397: The input file is not sorted and
        !           398: .Fl C
        !           399: or
        !           400: .Fl c
        !           401: was given, or there are duplicate keys and
        !           402: .Fl Cu
        !           403: or
        !           404: .Fl cu
        !           405: was given.
        !           406: .It 2
        !           407: An error occurred.
1.1       millert   408: .El
                    409: .Sh SEE ALSO
                    410: .Xr comm 1 ,
1.3       aaron     411: .Xr join 1 ,
1.18      fgsch     412: .Xr uniq 1 ,
                    413: .Xr radixsort 3
1.27      dlg       414: .Sh STANDARDS
                    415: The
                    416: .Nm
1.28      jmc       417: utility is compliant with the
1.33      jmc       418: .St -p1003.1-2008
1.27      dlg       419: specification.
                    420: .Pp
                    421: The flags
1.32      jmc       422: .Op Fl HRsTz
1.28      jmc       423: are extensions to that specification.
1.1       millert   424: .Sh HISTORY
                    425: A
1.8       aaron     426: .Nm
1.1       millert   427: command appeared in
1.16      mickey    428: .At v3 .
1.1       millert   429: .Sh NOTES
1.14      ericj     430: .Nm
                    431: has no limits on input line length (other than imposed by available
                    432: memory) or any restrictions on bytes allowed within lines.
                    433: .Pp
                    434: To protect data
                    435: .Nm
                    436: .Fl o
                    437: calls
                    438: .Xr link 2
                    439: and
                    440: .Xr unlink 2 ,
                    441: and thus fails on protected directories.
                    442: .Pp
1.1       millert   443: The current sort command uses lexicographic radix sorting, which requires
1.12      aaron     444: that sort keys be kept in memory (as opposed to previous versions which
                    445: used quick and merge sorts and did not).
1.1       millert   446: Thus performance depends highly on efficient choice of sort keys, and the
                    447: .Fl b
                    448: option and the
                    449: .Ar field2
                    450: argument of the
                    451: .Fl k
                    452: option should be used whenever possible.
                    453: Similarly,
1.8       aaron     454: .Nm
1.1       millert   455: .Fl k1f
                    456: is equivalent to
1.8       aaron     457: .Nm
1.1       millert   458: .Fl f
                    459: and may take twice as long.
1.12      aaron     460: .Sh BUGS
                    461: To sort files larger than 60Mb, use
                    462: .Nm
                    463: .Fl H ;
                    464: files larger than 704Mb must be sorted in smaller pieces, then merged.