[BACK]Return to sort.1 CVS log [TXT][DIR] Up to [local] / src / usr.bin / sort

Annotation of src/usr.bin/sort/sort.1, Revision 1.33

1.33    ! jmc         1: .\"    $OpenBSD: sort.1,v 1.32 2008/10/01 06:39:18 jmc Exp $
1.1       millert     2: .\"
                      3: .\" Copyright (c) 1991, 1993
                      4: .\"    The Regents of the University of California.  All rights reserved.
                      5: .\"
                      6: .\" This code is derived from software contributed to Berkeley by
                      7: .\" the Institute of Electrical and Electronics Engineers, Inc.
                      8: .\"
                      9: .\" Redistribution and use in source and binary forms, with or without
                     10: .\" modification, are permitted provided that the following conditions
                     11: .\" are met:
                     12: .\" 1. Redistributions of source code must retain the above copyright
                     13: .\"    notice, this list of conditions and the following disclaimer.
                     14: .\" 2. Redistributions in binary form must reproduce the above copyright
                     15: .\"    notice, this list of conditions and the following disclaimer in the
                     16: .\"    documentation and/or other materials provided with the distribution.
1.20      millert    17: .\" 3. Neither the name of the University nor the names of its contributors
1.1       millert    18: .\"    may be used to endorse or promote products derived from this software
                     19: .\"    without specific prior written permission.
                     20: .\"
                     21: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
                     22: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
                     23: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
                     24: .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
                     25: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
                     26: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
                     27: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
                     28: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
                     29: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
                     30: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
                     31: .\" SUCH DAMAGE.
                     32: .\"
                     33: .\"     @(#)sort.1     8.1 (Berkeley) 6/6/93
                     34: .\"
1.33    ! jmc        35: .Dd $Mdocdate: October 1 2008 $
1.1       millert    36: .Dt SORT 1
                     37: .Os
                     38: .Sh NAME
                     39: .Nm sort
                     40: .Nd sort or merge text files
                     41: .Sh SYNOPSIS
                     42: .Nm sort
1.30      millert    43: .Op Fl bcdfHimnrsuz
1.23      jmc        44: .Sm off
1.24      jmc        45: .Op Fl k\ \& Ar field1 Op , Ar field2
1.23      jmc        46: .Sm on
                     47: .Op Fl o Ar output
                     48: .Op Fl R Ar char
                     49: .Bk -words
1.1       millert    50: .Op Fl T Ar dir
1.23      jmc        51: .Ek
                     52: .Op Fl t Ar char
                     53: .Op Ar file ...
1.1       millert    54: .Sh DESCRIPTION
                     55: The
1.8       aaron      56: .Nm
1.12      aaron      57: utility sorts text files by lines.
1.1       millert    58: Comparisons are based on one or more sort keys extracted
1.8       aaron      59: from each line of input, and are performed lexicographically.
                     60: By default, if keys are not given,
                     61: .Nm
1.1       millert    62: regards each input line as a single field.
                     63: .Pp
1.7       aaron      64: The options are as follows:
1.21      jmc        65: .Bl -tag -width Ds
1.1       millert    66: .It Fl c
                     67: Check that the single input file is sorted.
                     68: If the file is not sorted,
1.8       aaron      69: .Nm
1.12      aaron      70: produces the appropriate error messages and exits with code 1; otherwise,
1.8       aaron      71: .Nm
1.1       millert    72: returns 0.
1.8       aaron      73: .Nm
1.1       millert    74: .Fl c
1.6       pjanzen    75: produces no output, except the error messages on
                     76: .Em stderr .
1.1       millert    77: .It Fl m
                     78: Merge only; the input files are assumed to be pre-sorted.
                     79: .It Fl o Ar output
                     80: The argument given is the name of an
                     81: .Ar output
1.12      aaron      82: file to be used instead of the standard output.
                     83: This file can be the same as one of the input files.
1.1       millert    84: .It Fl T Ar dir
                     85: Use
                     86: .Ar dir
1.8       aaron      87: as the directory for temporary files.
                     88: The default is the contents of the environment variable
1.1       millert    89: .Ev TMPDIR
                     90: or
                     91: .Pa /var/tmp
                     92: if
                     93: .Ev TMPDIR
                     94: does not exist.
                     95: .It Fl u
1.12      aaron      96: Unique: suppress all but one in each set of lines having equal keys.
1.1       millert    97: If used with the
                     98: .Fl c
1.26      jmc        99: option, also check that there are no lines with duplicate keys.
1.1       millert   100: .El
                    101: .Pp
                    102: The following options override the default ordering rules.
                    103: When ordering options appear independent of key field
                    104: specifications, the requested field ordering rules are
                    105: applied globally to all sort keys.
                    106: When attached to a specific key (see
                    107: .Fl k ) ,
                    108: the ordering options override
                    109: all global ordering options for that key.
                    110: .Bl -tag -width indent
                    111: .It Fl d
                    112: Only blank space and alphanumeric characters
                    113: .\" according
                    114: .\" to the current setting of LC_CTYPE
1.12      aaron     115: are used in making comparisons.
1.1       millert   116: .It Fl f
                    117: Considers all lowercase characters that have uppercase
1.12      aaron     118: equivalents to be the same for purposes of comparison.
1.23      jmc       119: .It Fl H
                    120: Use a merge sort instead of a radix sort.
                    121: This option should be used for files larger than 60Mb.
1.1       millert   122: .It Fl i
                    123: Ignore all non-printable characters.
                    124: .It Fl n
1.12      aaron     125: An initial numeric string, consisting of optional blank space, optional
                    126: minus sign, and zero or more digits (including decimal point)
1.1       millert   127: .\" with
                    128: .\" optional radix character and thousands
                    129: .\" separator
                    130: .\" (as defined in the current locale),
                    131: is sorted by arithmetic value.
                    132: (The
                    133: .Fl n
1.12      aaron     134: option no longer implies the
1.1       millert   135: .Fl b
                    136: option.)
                    137: .It Fl r
                    138: Reverse the sense of comparisons.
1.30      millert   139: .It Fl s
1.31      millert   140: Enable stable sort.
                    141: Uses additional resources (see
                    142: .Xr sradixsort 3 ) .
1.1       millert   143: .El
                    144: .Pp
1.12      aaron     145: The treatment of field separators can be altered using these options:
1.1       millert   146: .Bl -tag -width indent
                    147: .It Fl b
                    148: Ignores leading blank space when determining the start
                    149: and end of a restricted sort key.
                    150: A
                    151: .Fl b
                    152: option specified before the first
                    153: .Fl k
                    154: option applies globally to all
                    155: .Fl k
                    156: options.
                    157: Otherwise, the
                    158: .Fl b
1.12      aaron     159: option can be attached independently to each
1.1       millert   160: .Ar field
                    161: argument of the
                    162: .Fl k
                    163: option (see below).
                    164: Note that the
                    165: .Fl b
1.12      aaron     166: option has no effect unless key fields are specified.
1.23      jmc       167: .It Xo
                    168: .Sm off
                    169: .Fl k\ \& Ar field1 Op , Ar field2
                    170: .Sm on
                    171: .Xc
                    172: Designates the starting position,
                    173: .Ar field1 ,
                    174: and optional ending position,
                    175: .Ar field2 ,
                    176: of a key field.
1.25      jmc       177: The
                    178: .Fl k
                    179: option may be specified multiple times,
                    180: in which case subsequent keys are compared after earlier keys compare equal.
1.23      jmc       181: The
                    182: .Fl k
                    183: option replaces the obsolescent options
                    184: .Cm \(pl Ns Ar pos1
                    185: and
                    186: .Fl Ns Ar pos2 .
                    187: .It Fl R Ar char
                    188: .Ar char
                    189: is used as the record separator character.
                    190: This should be used with discretion;
                    191: .Fl R Aq Ar alphanumeric
                    192: usually produces undesirable results.
                    193: The default record separator is newline.
1.1       millert   194: .It Fl t Ar char
1.3       aaron     195: .Ar char
1.8       aaron     196: is used as the field separator character.
                    197: The initial
1.1       millert   198: .Ar char
1.12      aaron     199: is not considered to be part of a field when determining key offsets.
1.1       millert   200: Each occurrence of
                    201: .Ar char
                    202: is significant (for example,
                    203: .Dq Ar charchar
                    204: delimits an empty field).
                    205: If
                    206: .Fl t
1.6       pjanzen   207: is not specified, the default field separator is a sequence of
                    208: blank-space characters, and consecutive blank spaces do
                    209: .Em not
                    210: delimit an empty field; further, the initial blank space
                    211: .Em is
                    212: considered part of a field when determining key offsets.
1.22      dlg       213: .It Fl z
                    214: Uses the nul character as the record separator.
1.1       millert   215: .El
                    216: .Pp
                    217: The following operands are available:
                    218: .Bl -tag -width indent
1.3       aaron     219: .It Ar file
                    220: The pathname of a file to be sorted, merged, or checked.
                    221: If no
1.1       millert   222: .Ar file
1.12      aaron     223: operands are specified, or if a
1.3       aaron     224: .Ar file
                    225: operand is
1.1       millert   226: .Fl ,
                    227: the standard input is used.
1.3       aaron     228: .El
1.1       millert   229: .Pp
1.12      aaron     230: A field is defined as a maximal sequence of characters other than the
1.6       pjanzen   231: field separator and record separator
                    232: .Pq newline by default .
                    233: Initial blank spaces are included in the field unless
                    234: .Fl b
                    235: has been specified;
                    236: the first blank space of a sequence of blank spaces acts as the field
                    237: separator and is included in the field (unless
                    238: .Fl t
                    239: is specified).
                    240: For example, by default all blank spaces at the beginning of a line are
                    241: considered to be part of the first field.
1.1       millert   242: .Pp
1.12      aaron     243: Fields are specified by the
1.23      jmc       244: .Sm off
                    245: .Fl k\ \& Ar field1 Op , Ar field2
                    246: .Sm on
1.8       aaron     247: argument.
                    248: A missing
1.1       millert   249: .Ar field2
                    250: argument defaults to the end of a line.
                    251: .Pp
                    252: The arguments
                    253: .Ar field1
                    254: and
                    255: .Ar field2
                    256: have the form
                    257: .Em m.n
1.6       pjanzen   258: .Em (m,n > 0)
                    259: and can be followed by one or more of the letters
                    260: .Cm b , d , f , i ,
1.10      aaron     261: .Cm n ,
1.6       pjanzen   262: and
                    263: .Cm r ,
                    264: which correspond to the options discussed above.
1.1       millert   265: A
                    266: .Ar field1
                    267: position specified by
                    268: .Em m.n
                    269: is interpreted as the
                    270: .Em n Ns th
1.6       pjanzen   271: character from the beginning of the
1.1       millert   272: .Em m Ns th
                    273: field.
                    274: A missing
                    275: .Em \&.n
                    276: in
                    277: .Ar field1
                    278: means
                    279: .Ql \&.1 ,
                    280: indicating the first character of the
                    281: .Em m Ns th
1.12      aaron     282: field; if the
1.1       millert   283: .Fl b
                    284: option is in effect,
                    285: .Em n
1.12      aaron     286: is counted from the first non-blank character in the
1.1       millert   287: .Em m Ns th
                    288: field;
                    289: .Em m Ns \&.1b
1.12      aaron     290: refers to the first non-blank character in the
1.1       millert   291: .Em m Ns th
                    292: field.
1.6       pjanzen   293: .No 1\&. Ns Em n
                    294: refers to the
                    295: .Em n Ns th
                    296: character from the beginning of the line;
                    297: if
                    298: .Em n
                    299: is greater than the length of the line, the field is taken to be empty.
1.1       millert   300: .Pp
                    301: A
                    302: .Ar field2
                    303: position specified by
                    304: .Em m.n
1.12      aaron     305: is interpreted as the
1.1       millert   306: .Em n Ns th
                    307: character (including separators) of the
                    308: .Em m Ns th
                    309: field.
                    310: A missing
                    311: .Em \&.n
1.5       aaron     312: indicates the last character of the
1.1       millert   313: .Em m Ns th
                    314: field;
1.5       aaron     315: .Em m
1.1       millert   316: = \&0
                    317: designates the end of a line.
                    318: Thus the option
                    319: .Fl k Ar v.x,w.y
                    320: is synonymous with the obsolescent option
                    321: .Cm \(pl Ns Ar v-\&1.x-\&1
                    322: .Fl Ns Ar w-\&1.y ;
                    323: when
                    324: .Em y
                    325: is omitted,
                    326: .Fl k Ar v.x,w
                    327: is synonymous with
1.5       aaron     328: .Cm \(pl Ns Ar v-\&1.x-\&1
1.19      tdeval    329: .Fl Ns Ar w\&.0 .
1.1       millert   330: The obsolescent
                    331: .Cm \(pl Ns Ar pos1
                    332: .Fl Ns Ar pos2
                    333: option is still supported, except for
1.3       aaron     334: .Fl Ns Ar w\&.0b ,
1.1       millert   335: which has no
                    336: .Fl k
                    337: equivalent.
1.8       aaron     338: .Pp
                    339: The
                    340: .Nm
                    341: utility shall exit with one of the following values:
                    342: .Pp
                    343: .Bl -tag -width flag -compact
                    344: .It 0
                    345: Normal behavior.
                    346: .It 1
                    347: On disorder (or non-uniqueness) with the
                    348: .Fl c
                    349: option.
                    350: .It 2
                    351: An error occurred.
                    352: .El
1.1       millert   353: .Sh ENVIRONMENT
                    354: .Bl -tag -width Fl
                    355: .It Ev TMPDIR
1.3       aaron     356: Path in which to store temporary files.
                    357: Note that
1.1       millert   358: .Ev TMPDIR
                    359: may be overridden by the
                    360: .Fl T
                    361: option.
1.11      aaron     362: .El
1.1       millert   363: .Sh FILES
                    364: .Bl -tag -width Pa -compact
                    365: .It Pa /var/tmp/sort.*
1.3       aaron     366: default temporary directories
1.1       millert   367: .It Pa Ar output Ns #PID
1.3       aaron     368: temporary name for
1.1       millert   369: .Ar output
                    370: if
                    371: .Ar output
1.3       aaron     372: already exists
1.1       millert   373: .El
                    374: .Sh SEE ALSO
                    375: .Xr comm 1 ,
1.3       aaron     376: .Xr join 1 ,
1.18      fgsch     377: .Xr uniq 1 ,
                    378: .Xr radixsort 3
1.27      dlg       379: .Sh STANDARDS
                    380: The
                    381: .Nm
1.28      jmc       382: utility is compliant with the
1.33    ! jmc       383: .St -p1003.1-2008
1.27      dlg       384: specification.
                    385: .Pp
                    386: The flags
1.32      jmc       387: .Op Fl HRsTz
1.28      jmc       388: are extensions to that specification.
1.1       millert   389: .Sh HISTORY
                    390: A
1.8       aaron     391: .Nm
1.1       millert   392: command appeared in
1.16      mickey    393: .At v3 .
1.1       millert   394: .Sh NOTES
1.14      ericj     395: .Nm
                    396: has no limits on input line length (other than imposed by available
                    397: memory) or any restrictions on bytes allowed within lines.
                    398: .Pp
                    399: To protect data
                    400: .Nm
                    401: .Fl o
                    402: calls
                    403: .Xr link 2
                    404: and
                    405: .Xr unlink 2 ,
                    406: and thus fails on protected directories.
                    407: .Pp
1.1       millert   408: The current sort command uses lexicographic radix sorting, which requires
1.12      aaron     409: that sort keys be kept in memory (as opposed to previous versions which
                    410: used quick and merge sorts and did not).
1.1       millert   411: Thus performance depends highly on efficient choice of sort keys, and the
                    412: .Fl b
                    413: option and the
                    414: .Ar field2
                    415: argument of the
                    416: .Fl k
                    417: option should be used whenever possible.
                    418: Similarly,
1.8       aaron     419: .Nm
1.1       millert   420: .Fl k1f
                    421: is equivalent to
1.8       aaron     422: .Nm
1.1       millert   423: .Fl f
                    424: and may take twice as long.
1.12      aaron     425: .Sh BUGS
                    426: To sort files larger than 60Mb, use
                    427: .Nm
                    428: .Fl H ;
                    429: files larger than 704Mb must be sorted in smaller pieces, then merged.