[BACK]Return to sort.1 CVS log [TXT][DIR] Up to [local] / src / usr.bin / sort

Annotation of src/usr.bin/sort/sort.1, Revision 1.14

1.14    ! ericj       1: .\"    $OpenBSD: sort.1,v 1.13 2000/11/09 17:52:39 aaron Exp $
1.1       millert     2: .\"
                      3: .\" Copyright (c) 1991, 1993
                      4: .\"    The Regents of the University of California.  All rights reserved.
                      5: .\"
                      6: .\" This code is derived from software contributed to Berkeley by
                      7: .\" the Institute of Electrical and Electronics Engineers, Inc.
                      8: .\"
                      9: .\" Redistribution and use in source and binary forms, with or without
                     10: .\" modification, are permitted provided that the following conditions
                     11: .\" are met:
                     12: .\" 1. Redistributions of source code must retain the above copyright
                     13: .\"    notice, this list of conditions and the following disclaimer.
                     14: .\" 2. Redistributions in binary form must reproduce the above copyright
                     15: .\"    notice, this list of conditions and the following disclaimer in the
                     16: .\"    documentation and/or other materials provided with the distribution.
                     17: .\" 3. All advertising materials mentioning features or use of this software
                     18: .\"    must display the following acknowledgement:
                     19: .\"    This product includes software developed by the University of
                     20: .\"    California, Berkeley and its contributors.
                     21: .\" 4. Neither the name of the University nor the names of its contributors
                     22: .\"    may be used to endorse or promote products derived from this software
                     23: .\"    without specific prior written permission.
                     24: .\"
                     25: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
                     26: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
                     27: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
                     28: .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
                     29: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
                     30: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
                     31: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
                     32: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
                     33: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
                     34: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
                     35: .\" SUCH DAMAGE.
                     36: .\"
                     37: .\"     @(#)sort.1     8.1 (Berkeley) 6/6/93
                     38: .\"
                     39: .Dd June 6, 1993
                     40: .Dt SORT 1
                     41: .Os
                     42: .Sh NAME
                     43: .Nm sort
                     44: .Nd sort or merge text files
                     45: .Sh SYNOPSIS
                     46: .Nm sort
1.2       deraadt    47: .Op Fl cmubdfinrH
1.1       millert    48: .Op Fl t Ar char
                     49: .Op Fl R Ar char
                     50: .Oo
                     51: .Cm Fl k Ar field1[,field2]
                     52: .Oc
                     53: .Ar ...
                     54: .Op Fl T Ar dir
                     55: .Op Fl o Ar output
                     56: .Op Ar file
                     57: .Ar ...
                     58: .Sh DESCRIPTION
                     59: The
1.8       aaron      60: .Nm
1.12      aaron      61: utility sorts text files by lines.
1.1       millert    62: Comparisons are based on one or more sort keys extracted
1.8       aaron      63: from each line of input, and are performed lexicographically.
                     64: By default, if keys are not given,
                     65: .Nm
1.1       millert    66: regards each input line as a single field.
                     67: .Pp
1.7       aaron      68: The options are as follows:
1.13      aaron      69: .Bl -tag -width file Ds
1.1       millert    70: .It Fl c
                     71: Check that the single input file is sorted.
                     72: If the file is not sorted,
1.8       aaron      73: .Nm
1.12      aaron      74: produces the appropriate error messages and exits with code 1; otherwise,
1.8       aaron      75: .Nm
1.1       millert    76: returns 0.
1.8       aaron      77: .Nm
1.1       millert    78: .Fl c
1.6       pjanzen    79: produces no output, except the error messages on
                     80: .Em stderr .
1.1       millert    81: .It Fl m
                     82: Merge only; the input files are assumed to be pre-sorted.
                     83: .It Fl o Ar output
                     84: The argument given is the name of an
                     85: .Ar output
1.12      aaron      86: file to be used instead of the standard output.
                     87: This file can be the same as one of the input files.
1.1       millert    88: .It Fl T Ar dir
                     89: Use
                     90: .Ar dir
1.8       aaron      91: as the directory for temporary files.
                     92: The default is the contents of the environment variable
1.1       millert    93: .Ev TMPDIR
                     94: or
                     95: .Pa /var/tmp
                     96: if
                     97: .Ev TMPDIR
                     98: does not exist.
                     99: .It Fl u
1.12      aaron     100: Unique: suppress all but one in each set of lines having equal keys.
1.1       millert   101: If used with the
                    102: .Fl c
1.12      aaron     103: option, check that there are no lines with duplicate keys.
1.1       millert   104: .El
                    105: .Pp
                    106: The following options override the default ordering rules.
                    107: When ordering options appear independent of key field
                    108: specifications, the requested field ordering rules are
                    109: applied globally to all sort keys.
                    110: When attached to a specific key (see
                    111: .Fl k ) ,
                    112: the ordering options override
                    113: all global ordering options for that key.
                    114: .Bl -tag -width indent
                    115: .It Fl d
                    116: Only blank space and alphanumeric characters
                    117: .\" according
                    118: .\" to the current setting of LC_CTYPE
1.12      aaron     119: are used in making comparisons.
1.1       millert   120: .It Fl f
                    121: Considers all lowercase characters that have uppercase
1.12      aaron     122: equivalents to be the same for purposes of comparison.
1.1       millert   123: .It Fl i
                    124: Ignore all non-printable characters.
                    125: .It Fl n
1.12      aaron     126: An initial numeric string, consisting of optional blank space, optional
                    127: minus sign, and zero or more digits (including decimal point)
1.1       millert   128: .\" with
                    129: .\" optional radix character and thousands
                    130: .\" separator
                    131: .\" (as defined in the current locale),
                    132: is sorted by arithmetic value.
                    133: (The
                    134: .Fl n
1.12      aaron     135: option no longer implies the
1.1       millert   136: .Fl b
                    137: option.)
                    138: .It Fl r
                    139: Reverse the sense of comparisons.
                    140: .It Fl H
1.8       aaron     141: Use a merge sort instead of a radix sort.
                    142: This options should be used for files larger than 60Mb.
1.1       millert   143: .El
                    144: .Pp
1.12      aaron     145: The treatment of field separators can be altered using these options:
1.1       millert   146: .Bl -tag -width indent
                    147: .It Fl b
                    148: Ignores leading blank space when determining the start
                    149: and end of a restricted sort key.
                    150: A
                    151: .Fl b
                    152: option specified before the first
                    153: .Fl k
                    154: option applies globally to all
                    155: .Fl k
                    156: options.
                    157: Otherwise, the
                    158: .Fl b
1.12      aaron     159: option can be attached independently to each
1.1       millert   160: .Ar field
                    161: argument of the
                    162: .Fl k
                    163: option (see below).
                    164: Note that the
                    165: .Fl b
1.12      aaron     166: option has no effect unless key fields are specified.
1.1       millert   167: .It Fl t Ar char
1.3       aaron     168: .Ar char
1.8       aaron     169: is used as the field separator character.
                    170: The initial
1.1       millert   171: .Ar char
1.12      aaron     172: is not considered to be part of a field when determining key offsets.
1.1       millert   173: Each occurrence of
                    174: .Ar char
                    175: is significant (for example,
                    176: .Dq Ar charchar
                    177: delimits an empty field).
                    178: If
                    179: .Fl t
1.6       pjanzen   180: is not specified, the default field separator is a sequence of
                    181: blank-space characters, and consecutive blank spaces do
                    182: .Em not
                    183: delimit an empty field; further, the initial blank space
                    184: .Em is
                    185: considered part of a field when determining key offsets.
1.1       millert   186: .It Fl R Ar char
1.3       aaron     187: .Ar char
1.1       millert   188: is used as the record separator character.
                    189: This should be used with discretion;
                    190: .Fl R Ar <alphanumeric>
                    191: usually produces undesirable results.
1.4       aaron     192: The default record separator is newline.
1.1       millert   193: .It Fl k Ar field1[,field2]
                    194: Designates the starting position,
                    195: .Ar field1 ,
1.5       aaron     196: and optional ending position,
1.1       millert   197: .Ar field2 ,
                    198: of a key field.
                    199: The
                    200: .Fl k
                    201: option replaces the obsolescent options
                    202: .Cm \(pl Ns Ar pos1
                    203: and
                    204: .Fl Ns Ar pos2 .
                    205: .El
                    206: .Pp
                    207: The following operands are available:
                    208: .Bl -tag -width indent
1.3       aaron     209: .It Ar file
                    210: The pathname of a file to be sorted, merged, or checked.
                    211: If no
1.1       millert   212: .Ar file
1.12      aaron     213: operands are specified, or if a
1.3       aaron     214: .Ar file
                    215: operand is
1.1       millert   216: .Fl ,
                    217: the standard input is used.
1.3       aaron     218: .El
1.1       millert   219: .Pp
1.12      aaron     220: A field is defined as a maximal sequence of characters other than the
1.6       pjanzen   221: field separator and record separator
                    222: .Pq newline by default .
                    223: Initial blank spaces are included in the field unless
                    224: .Fl b
                    225: has been specified;
                    226: the first blank space of a sequence of blank spaces acts as the field
                    227: separator and is included in the field (unless
                    228: .Fl t
                    229: is specified).
                    230: For example, by default all blank spaces at the beginning of a line are
                    231: considered to be part of the first field.
1.1       millert   232: .Pp
1.12      aaron     233: Fields are specified by the
1.1       millert   234: .Fl k Ar field1[,field2]
1.8       aaron     235: argument.
                    236: A missing
1.1       millert   237: .Ar field2
                    238: argument defaults to the end of a line.
                    239: .Pp
                    240: The arguments
                    241: .Ar field1
                    242: and
                    243: .Ar field2
                    244: have the form
                    245: .Em m.n
1.6       pjanzen   246: .Em (m,n > 0)
                    247: and can be followed by one or more of the letters
                    248: .Cm b , d , f , i ,
1.10      aaron     249: .Cm n ,
1.6       pjanzen   250: and
                    251: .Cm r ,
                    252: which correspond to the options discussed above.
1.1       millert   253: A
                    254: .Ar field1
                    255: position specified by
                    256: .Em m.n
                    257: is interpreted as the
                    258: .Em n Ns th
1.6       pjanzen   259: character from the beginning of the
1.1       millert   260: .Em m Ns th
                    261: field.
                    262: A missing
                    263: .Em \&.n
                    264: in
                    265: .Ar field1
                    266: means
                    267: .Ql \&.1 ,
                    268: indicating the first character of the
                    269: .Em m Ns th
1.12      aaron     270: field; if the
1.1       millert   271: .Fl b
                    272: option is in effect,
                    273: .Em n
1.12      aaron     274: is counted from the first non-blank character in the
1.1       millert   275: .Em m Ns th
                    276: field;
                    277: .Em m Ns \&.1b
1.12      aaron     278: refers to the first non-blank character in the
1.1       millert   279: .Em m Ns th
                    280: field.
1.6       pjanzen   281: .No 1\&. Ns Em n
                    282: refers to the
                    283: .Em n Ns th
                    284: character from the beginning of the line;
                    285: if
                    286: .Em n
                    287: is greater than the length of the line, the field is taken to be empty.
1.1       millert   288: .Pp
                    289: A
                    290: .Ar field2
                    291: position specified by
                    292: .Em m.n
1.12      aaron     293: is interpreted as the
1.1       millert   294: .Em n Ns th
                    295: character (including separators) of the
                    296: .Em m Ns th
                    297: field.
                    298: A missing
                    299: .Em \&.n
1.5       aaron     300: indicates the last character of the
1.1       millert   301: .Em m Ns th
                    302: field;
1.5       aaron     303: .Em m
1.1       millert   304: = \&0
                    305: designates the end of a line.
                    306: Thus the option
                    307: .Fl k Ar v.x,w.y
                    308: is synonymous with the obsolescent option
                    309: .Cm \(pl Ns Ar v-\&1.x-\&1
                    310: .Fl Ns Ar w-\&1.y ;
                    311: when
                    312: .Em y
                    313: is omitted,
                    314: .Fl k Ar v.x,w
                    315: is synonymous with
1.5       aaron     316: .Cm \(pl Ns Ar v-\&1.x-\&1
1.1       millert   317: .Fl Ns Ar w+1.0 .
                    318: The obsolescent
                    319: .Cm \(pl Ns Ar pos1
                    320: .Fl Ns Ar pos2
                    321: option is still supported, except for
1.3       aaron     322: .Fl Ns Ar w\&.0b ,
1.1       millert   323: which has no
                    324: .Fl k
                    325: equivalent.
1.8       aaron     326: .Pp
                    327: The
                    328: .Nm
                    329: utility shall exit with one of the following values:
                    330: .Pp
                    331: .Bl -tag -width flag -compact
                    332: .It 0
                    333: Normal behavior.
                    334: .It 1
                    335: On disorder (or non-uniqueness) with the
                    336: .Fl c
                    337: option.
                    338: .It 2
                    339: An error occurred.
                    340: .El
1.1       millert   341: .Sh ENVIRONMENT
1.8       aaron     342: The following environment variables affect the execution of
1.3       aaron     343: .Nm sort :
1.1       millert   344: .Bl -tag -width Fl
                    345: .It Ev TMPDIR
1.3       aaron     346: Path in which to store temporary files.
                    347: Note that
1.1       millert   348: .Ev TMPDIR
                    349: may be overridden by the
                    350: .Fl T
                    351: option.
1.11      aaron     352: .El
1.1       millert   353: .Sh FILES
                    354: .Bl -tag -width Pa -compact
                    355: .It Pa /var/tmp/sort.*
1.3       aaron     356: default temporary directories
1.1       millert   357: .It Pa Ar output Ns #PID
1.3       aaron     358: temporary name for
1.1       millert   359: .Ar output
                    360: if
                    361: .Ar output
1.3       aaron     362: already exists
1.1       millert   363: .El
                    364: .Sh SEE ALSO
                    365: .Xr comm 1 ,
1.3       aaron     366: .Xr join 1 ,
1.14    ! ericj     367: .Xr radixsort 3 ,
1.3       aaron     368: .Xr uniq 1
1.1       millert   369: .Sh HISTORY
                    370: A
1.8       aaron     371: .Nm
1.1       millert   372: command appeared in
1.9       aaron     373: .At v5 .
1.1       millert   374: .Sh NOTES
1.14    ! ericj     375: .Nm
        !           376: has no limits on input line length (other than imposed by available
        !           377: memory) or any restrictions on bytes allowed within lines.
        !           378: .Pp
        !           379: To protect data
        !           380: .Nm
        !           381: .Fl o
        !           382: calls
        !           383: .Xr link 2
        !           384: and
        !           385: .Xr unlink 2 ,
        !           386: and thus fails on protected directories.
        !           387: .Pp
1.1       millert   388: The current sort command uses lexicographic radix sorting, which requires
1.12      aaron     389: that sort keys be kept in memory (as opposed to previous versions which
                    390: used quick and merge sorts and did not).
1.1       millert   391: Thus performance depends highly on efficient choice of sort keys, and the
                    392: .Fl b
                    393: option and the
                    394: .Ar field2
                    395: argument of the
                    396: .Fl k
                    397: option should be used whenever possible.
                    398: Similarly,
1.8       aaron     399: .Nm
1.1       millert   400: .Fl k1f
                    401: is equivalent to
1.8       aaron     402: .Nm
1.1       millert   403: .Fl f
                    404: and may take twice as long.
1.12      aaron     405: .Sh BUGS
                    406: To sort files larger than 60Mb, use
                    407: .Nm
                    408: .Fl H ;
                    409: files larger than 704Mb must be sorted in smaller pieces, then merged.