[BACK]Return to sort.1 CVS log [TXT][DIR] Up to [local] / src / usr.bin / sort

Annotation of src/usr.bin/sort/sort.1, Revision 1.7

1.7     ! aaron       1: .\"    $OpenBSD: sort.1,v 1.6 2000/01/05 07:40:43 pjanzen Exp $
1.1       millert     2: .\"
                      3: .\" Copyright (c) 1991, 1993
                      4: .\"    The Regents of the University of California.  All rights reserved.
                      5: .\"
                      6: .\" This code is derived from software contributed to Berkeley by
                      7: .\" the Institute of Electrical and Electronics Engineers, Inc.
                      8: .\"
                      9: .\" Redistribution and use in source and binary forms, with or without
                     10: .\" modification, are permitted provided that the following conditions
                     11: .\" are met:
                     12: .\" 1. Redistributions of source code must retain the above copyright
                     13: .\"    notice, this list of conditions and the following disclaimer.
                     14: .\" 2. Redistributions in binary form must reproduce the above copyright
                     15: .\"    notice, this list of conditions and the following disclaimer in the
                     16: .\"    documentation and/or other materials provided with the distribution.
                     17: .\" 3. All advertising materials mentioning features or use of this software
                     18: .\"    must display the following acknowledgement:
                     19: .\"    This product includes software developed by the University of
                     20: .\"    California, Berkeley and its contributors.
                     21: .\" 4. Neither the name of the University nor the names of its contributors
                     22: .\"    may be used to endorse or promote products derived from this software
                     23: .\"    without specific prior written permission.
                     24: .\"
                     25: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
                     26: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
                     27: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
                     28: .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
                     29: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
                     30: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
                     31: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
                     32: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
                     33: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
                     34: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
                     35: .\" SUCH DAMAGE.
                     36: .\"
                     37: .\"     @(#)sort.1     8.1 (Berkeley) 6/6/93
                     38: .\"
                     39: .Dd June 6, 1993
                     40: .Dt SORT 1
                     41: .Os
                     42: .Sh NAME
                     43: .Nm sort
                     44: .Nd sort or merge text files
                     45: .Sh SYNOPSIS
                     46: .Nm sort
1.2       deraadt    47: .Op Fl cmubdfinrH
1.1       millert    48: .Op Fl t Ar char
                     49: .Op Fl R Ar char
                     50: .Oo
                     51: .Cm Fl k Ar field1[,field2]
                     52: .Oc
                     53: .Ar ...
                     54: .Op Fl T Ar dir
                     55: .Op Fl o Ar output
                     56: .Op Ar file
                     57: .Ar ...
                     58: .Sh DESCRIPTION
                     59: The
                     60: .Nm sort
                     61: utility
                     62: sorts text files by lines.
                     63: Comparisons are based on one or more sort keys extracted
                     64: from each line of input, and are performed
                     65: lexicographically. By default, if keys are not given,
                     66: .Nm sort
                     67: regards each input line as a single field.
                     68: .Pp
1.7     ! aaron      69: The options are as follows:
1.3       aaron      70: .Bl -tag -width file indent
1.1       millert    71: .It Fl c
                     72: Check that the single input file is sorted.
                     73: If the file is not sorted,
                     74: .Nm sort
                     75: produces the appropriate error messages and exits with code 1;
                     76: otherwise,
                     77: .Nm sort
                     78: returns 0.
1.3       aaron      79: .Nm sort
1.1       millert    80: .Fl c
1.6       pjanzen    81: produces no output, except the error messages on
                     82: .Em stderr .
1.1       millert    83: .It Fl m
                     84: Merge only; the input files are assumed to be pre-sorted.
                     85: .It Fl o Ar output
                     86: The argument given is the name of an
                     87: .Ar output
                     88: file to
                     89: be used instead of the standard output.
                     90: This file
                     91: can be the same as one of the input files.
                     92: .It Fl T Ar dir
                     93: Use
                     94: .Ar dir
                     95: as the directory for temporary files.  The default is the contents
                     96: of the environment variable
                     97: .Ev TMPDIR
                     98: or
                     99: .Pa /var/tmp
                    100: if
                    101: .Ev TMPDIR
                    102: does not exist.
                    103: .It Fl u
                    104: Unique: suppress all but one in each set of lines
                    105: having equal keys.
                    106: If used with the
                    107: .Fl c
                    108: option,
                    109: check that there are no lines with duplicate keys.
                    110: .El
                    111: .Pp
                    112: The following options override the default ordering rules.
                    113: When ordering options appear independent of key field
                    114: specifications, the requested field ordering rules are
                    115: applied globally to all sort keys.
                    116: When attached to a specific key (see
                    117: .Fl k ) ,
                    118: the ordering options override
                    119: all global ordering options for that key.
                    120: .Bl -tag -width indent
                    121: .It Fl d
                    122: Only blank space and alphanumeric characters
                    123: .\" according
                    124: .\" to the current setting of LC_CTYPE
                    125: are used
                    126: in making comparisons.
                    127: .It Fl f
                    128: Considers all lowercase characters that have uppercase
                    129: equivalents to be the same for purposes of
                    130: comparison.
                    131: .It Fl i
                    132: Ignore all non-printable characters.
                    133: .It Fl n
                    134: An initial numeric string, consisting of optional
                    135: blank space, optional minus sign, and zero or more
                    136: digits (including decimal point)
                    137: .\" with
                    138: .\" optional radix character and thousands
                    139: .\" separator
                    140: .\" (as defined in the current locale),
                    141: is sorted by arithmetic value.
                    142: (The
                    143: .Fl n
                    144: option no longer implies
                    145: the
                    146: .Fl b
                    147: option.)
                    148: .It Fl r
                    149: Reverse the sense of comparisons.
                    150: .It Fl H
                    151: Use a merge sort instead of a radix sort.  This option should be
                    152: used for files larger than 60Mb.
                    153: .El
                    154: .Pp
1.3       aaron     155: The treatment of field separators can be altered using these
1.1       millert   156: options:
                    157: .Bl -tag -width indent
                    158: .It Fl b
                    159: Ignores leading blank space when determining the start
                    160: and end of a restricted sort key.
                    161: A
                    162: .Fl b
                    163: option specified before the first
                    164: .Fl k
                    165: option applies globally to all
                    166: .Fl k
                    167: options.
                    168: Otherwise, the
                    169: .Fl b
                    170: option can be
                    171: attached independently to each
                    172: .Ar field
                    173: argument of the
                    174: .Fl k
                    175: option (see below).
                    176: Note that the
                    177: .Fl b
                    178: option
                    179: has no effect unless key fields are specified.
                    180: .It Fl t Ar char
1.3       aaron     181: .Ar char
1.1       millert   182: is used as the field separator character. The initial
                    183: .Ar char
                    184: is not considered to be part of a field when determining
1.6       pjanzen   185: key offsets.
1.1       millert   186: Each occurrence of
                    187: .Ar char
                    188: is significant (for example,
                    189: .Dq Ar charchar
                    190: delimits an empty field).
                    191: If
                    192: .Fl t
1.6       pjanzen   193: is not specified, the default field separator is a sequence of
                    194: blank-space characters, and consecutive blank spaces do
                    195: .Em not
                    196: delimit an empty field; further, the initial blank space
                    197: .Em is
                    198: considered part of a field when determining key offsets.
1.1       millert   199: .It Fl R Ar char
1.3       aaron     200: .Ar char
1.1       millert   201: is used as the record separator character.
                    202: This should be used with discretion;
                    203: .Fl R Ar <alphanumeric>
                    204: usually produces undesirable results.
1.4       aaron     205: The default record separator is newline.
1.1       millert   206: .It Fl k Ar field1[,field2]
                    207: Designates the starting position,
                    208: .Ar field1 ,
1.5       aaron     209: and optional ending position,
1.1       millert   210: .Ar field2 ,
                    211: of a key field.
                    212: The
                    213: .Fl k
                    214: option replaces the obsolescent options
                    215: .Cm \(pl Ns Ar pos1
                    216: and
                    217: .Fl Ns Ar pos2 .
                    218: .El
                    219: .Pp
                    220: The following operands are available:
                    221: .Bl -tag -width indent
1.3       aaron     222: .It Ar file
                    223: The pathname of a file to be sorted, merged, or checked.
                    224: If no
1.1       millert   225: .Ar file
                    226: operands are specified, or if
1.3       aaron     227: a
                    228: .Ar file
                    229: operand is
1.1       millert   230: .Fl ,
                    231: the standard input is used.
1.3       aaron     232: .El
1.1       millert   233: .Pp
                    234: A field is
1.6       pjanzen   235: defined as a maximal sequence of characters other than the
                    236: field separator and record separator
                    237: .Pq newline by default .
                    238: Initial blank spaces are included in the field unless
                    239: .Fl b
                    240: has been specified;
                    241: the first blank space of a sequence of blank spaces acts as the field
                    242: separator and is included in the field (unless
                    243: .Fl t
                    244: is specified).
                    245: For example, by default all blank spaces at the beginning of a line are
                    246: considered to be part of the first field.
1.1       millert   247: .Pp
                    248: Fields are specified
                    249: by the
                    250: .Fl k Ar field1[,field2]
                    251: argument. A missing
                    252: .Ar field2
                    253: argument defaults to the end of a line.
                    254: .Pp
                    255: The arguments
                    256: .Ar field1
                    257: and
                    258: .Ar field2
                    259: have the form
                    260: .Em m.n
1.6       pjanzen   261: .Em (m,n > 0)
                    262: and can be followed by one or more of the letters
                    263: .Cm b , d , f , i ,
                    264: .Cm n ,
                    265: and
                    266: .Cm r ,
                    267: which correspond to the options discussed above.
1.1       millert   268: A
                    269: .Ar field1
                    270: position specified by
                    271: .Em m.n
                    272: is interpreted as the
                    273: .Em n Ns th
1.6       pjanzen   274: character from the beginning of the
1.1       millert   275: .Em m Ns th
                    276: field.
                    277: A missing
                    278: .Em \&.n
                    279: in
                    280: .Ar field1
                    281: means
                    282: .Ql \&.1 ,
                    283: indicating the first character of the
                    284: .Em m Ns th
                    285: field;
1.3       aaron     286: if the
1.1       millert   287: .Fl b
                    288: option is in effect,
                    289: .Em n
                    290: is counted from the first
                    291: non-blank character in the
                    292: .Em m Ns th
                    293: field;
                    294: .Em m Ns \&.1b
                    295: refers to the first
                    296: non-blank character in the
                    297: .Em m Ns th
                    298: field.
1.6       pjanzen   299: .No 1\&. Ns Em n
                    300: refers to the
                    301: .Em n Ns th
                    302: character from the beginning of the line;
                    303: if
                    304: .Em n
                    305: is greater than the length of the line, the field is taken to be empty.
1.1       millert   306: .Pp
                    307: A
                    308: .Ar field2
                    309: position specified by
                    310: .Em m.n
                    311: is interpreted as
                    312: the
                    313: .Em n Ns th
                    314: character (including separators) of the
                    315: .Em m Ns th
                    316: field.
                    317: A missing
                    318: .Em \&.n
1.5       aaron     319: indicates the last character of the
1.1       millert   320: .Em m Ns th
                    321: field;
1.5       aaron     322: .Em m
1.1       millert   323: = \&0
                    324: designates the end of a line.
                    325: Thus the option
                    326: .Fl k Ar v.x,w.y
                    327: is synonymous with the obsolescent option
                    328: .Cm \(pl Ns Ar v-\&1.x-\&1
                    329: .Fl Ns Ar w-\&1.y ;
                    330: when
                    331: .Em y
                    332: is omitted,
                    333: .Fl k Ar v.x,w
                    334: is synonymous with
1.5       aaron     335: .Cm \(pl Ns Ar v-\&1.x-\&1
1.1       millert   336: .Fl Ns Ar w+1.0 .
                    337: The obsolescent
                    338: .Cm \(pl Ns Ar pos1
                    339: .Fl Ns Ar pos2
                    340: option is still supported, except for
1.3       aaron     341: .Fl Ns Ar w\&.0b ,
1.1       millert   342: which has no
                    343: .Fl k
                    344: equivalent.
                    345: .Sh ENVIRONMENT
                    346: If the following environment variable exists, it is utilized by
1.3       aaron     347: .Nm sort :
1.1       millert   348: .Bl -tag -width Fl
                    349: .It Ev TMPDIR
1.3       aaron     350: Path in which to store temporary files.
                    351: Note that
1.1       millert   352: .Ev TMPDIR
                    353: may be overridden by the
                    354: .Fl T
                    355: option.
                    356: .Sh FILES
                    357: .Bl -tag -width Pa -compact
                    358: .It Pa /var/tmp/sort.*
1.3       aaron     359: default temporary directories
1.1       millert   360: .It Pa Ar output Ns #PID
1.3       aaron     361: temporary name for
1.1       millert   362: .Ar output
                    363: if
                    364: .Ar output
1.3       aaron     365: already exists
1.1       millert   366: .El
                    367: .Sh SEE ALSO
                    368: .Xr comm 1 ,
1.3       aaron     369: .Xr join 1 ,
                    370: .Xr uniq 1
1.1       millert   371: .Sh RETURN VALUES
1.3       aaron     372: .Nm sort
                    373: exits with one of the following values:
                    374: .Pp
1.1       millert   375: .Bl -tag -width flag -compact
1.3       aaron     376: .It 0
                    377: Normal behavior.
                    378: .It 1
                    379: On disorder (or non-uniqueness) with the
1.1       millert   380: .Fl c
1.3       aaron     381: option.
                    382: .It 2
                    383: An error occurred.
1.1       millert   384: .Sh BUGS
                    385: Lines longer than 65522 characters are discarded and processing continues.
                    386: To sort files larger than 60Mb, use
                    387: .Nm sort
                    388: .Fl H ;
                    389: files larger than 704Mb must be sorted in smaller pieces, then merged.
                    390: To protect data
                    391: .Nm sort
                    392: .Fl o
                    393: calls link and unlink, and thus fails in protected directories.
                    394: .Sh HISTORY
                    395: A
                    396: .Nm sort
                    397: command appeared in
                    398: .At v6 .
                    399: .Sh NOTES
                    400: The current sort command uses lexicographic radix sorting, which requires
                    401: that sort keys be kept in memory (as opposed to previous versions which used quick
1.3       aaron     402: and merge sorts and did not).
1.1       millert   403: Thus performance depends highly on efficient choice of sort keys, and the
                    404: .Fl b
                    405: option and the
                    406: .Ar field2
                    407: argument of the
                    408: .Fl k
                    409: option should be used whenever possible.
                    410: Similarly,
                    411: .Nm sort
                    412: .Fl k1f
                    413: is equivalent to
                    414: .Nm sort
                    415: .Fl f
                    416: and may take twice as long.