Annotation of src/usr.bin/sort/sort.1, Revision 1.1
1.1 ! millert 1: .\" $OpenBSD$
! 2: .\"
! 3: .\" Copyright (c) 1991, 1993
! 4: .\" The Regents of the University of California. All rights reserved.
! 5: .\"
! 6: .\" This code is derived from software contributed to Berkeley by
! 7: .\" the Institute of Electrical and Electronics Engineers, Inc.
! 8: .\"
! 9: .\" Redistribution and use in source and binary forms, with or without
! 10: .\" modification, are permitted provided that the following conditions
! 11: .\" are met:
! 12: .\" 1. Redistributions of source code must retain the above copyright
! 13: .\" notice, this list of conditions and the following disclaimer.
! 14: .\" 2. Redistributions in binary form must reproduce the above copyright
! 15: .\" notice, this list of conditions and the following disclaimer in the
! 16: .\" documentation and/or other materials provided with the distribution.
! 17: .\" 3. All advertising materials mentioning features or use of this software
! 18: .\" must display the following acknowledgement:
! 19: .\" This product includes software developed by the University of
! 20: .\" California, Berkeley and its contributors.
! 21: .\" 4. Neither the name of the University nor the names of its contributors
! 22: .\" may be used to endorse or promote products derived from this software
! 23: .\" without specific prior written permission.
! 24: .\"
! 25: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
! 26: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
! 27: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
! 28: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
! 29: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
! 30: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
! 31: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
! 32: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
! 33: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
! 34: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
! 35: .\" SUCH DAMAGE.
! 36: .\"
! 37: .\" @(#)sort.1 8.1 (Berkeley) 6/6/93
! 38: .\"
! 39: .Dd June 6, 1993
! 40: .Dt SORT 1
! 41: .Os
! 42: .Sh NAME
! 43: .Nm sort
! 44: .Nd sort or merge text files
! 45: .Sh SYNOPSIS
! 46: .Nm sort
! 47: .Op Fl cmubdfinr
! 48: .Op Fl t Ar char
! 49: .Op Fl R Ar char
! 50: .Oo
! 51: .Cm Fl k Ar field1[,field2]
! 52: .Oc
! 53: .Ar ...
! 54: .Op Fl T Ar dir
! 55: .Op Fl o Ar output
! 56: .Op Ar file
! 57: .Ar ...
! 58: .Sh DESCRIPTION
! 59: The
! 60: .Nm sort
! 61: utility
! 62: sorts text files by lines.
! 63: Comparisons are based on one or more sort keys extracted
! 64: from each line of input, and are performed
! 65: lexicographically. By default, if keys are not given,
! 66: .Nm sort
! 67: regards each input line as a single field.
! 68: .Pp
! 69: The following options are available:
! 70: .Bl -tag -width indent
! 71: .It Fl c
! 72: Check that the single input file is sorted.
! 73: If the file is not sorted,
! 74: .Nm sort
! 75: produces the appropriate error messages and exits with code 1;
! 76: otherwise,
! 77: .Nm sort
! 78: returns 0.
! 79: .Nm Sort
! 80: .Fl c
! 81: produces no output.
! 82: .It Fl m
! 83: Merge only; the input files are assumed to be pre-sorted.
! 84: .It Fl o Ar output
! 85: The argument given is the name of an
! 86: .Ar output
! 87: file to
! 88: be used instead of the standard output.
! 89: This file
! 90: can be the same as one of the input files.
! 91: .It Fl T Ar dir
! 92: Use
! 93: .Ar dir
! 94: as the directory for temporary files. The default is the contents
! 95: of the environment variable
! 96: .Ev TMPDIR
! 97: or
! 98: .Pa /var/tmp
! 99: if
! 100: .Ev TMPDIR
! 101: does not exist.
! 102: .It Fl u
! 103: Unique: suppress all but one in each set of lines
! 104: having equal keys.
! 105: If used with the
! 106: .Fl c
! 107: option,
! 108: check that there are no lines with duplicate keys.
! 109: .El
! 110: .Pp
! 111: The following options override the default ordering rules.
! 112: When ordering options appear independent of key field
! 113: specifications, the requested field ordering rules are
! 114: applied globally to all sort keys.
! 115: When attached to a specific key (see
! 116: .Fl k ) ,
! 117: the ordering options override
! 118: all global ordering options for that key.
! 119: .Bl -tag -width indent
! 120: .It Fl d
! 121: Only blank space and alphanumeric characters
! 122: .\" according
! 123: .\" to the current setting of LC_CTYPE
! 124: are used
! 125: in making comparisons.
! 126: .It Fl f
! 127: Considers all lowercase characters that have uppercase
! 128: equivalents to be the same for purposes of
! 129: comparison.
! 130: .It Fl i
! 131: Ignore all non-printable characters.
! 132: .It Fl n
! 133: An initial numeric string, consisting of optional
! 134: blank space, optional minus sign, and zero or more
! 135: digits (including decimal point)
! 136: .\" with
! 137: .\" optional radix character and thousands
! 138: .\" separator
! 139: .\" (as defined in the current locale),
! 140: is sorted by arithmetic value.
! 141: (The
! 142: .Fl n
! 143: option no longer implies
! 144: the
! 145: .Fl b
! 146: option.)
! 147: .It Fl r
! 148: Reverse the sense of comparisons.
! 149: .It Fl H
! 150: Use a merge sort instead of a radix sort. This option should be
! 151: used for files larger than 60Mb.
! 152: .El
! 153: .Pp
! 154: The treatment of field separators can be altered using the
! 155: options:
! 156: .Bl -tag -width indent
! 157: .It Fl b
! 158: Ignores leading blank space when determining the start
! 159: and end of a restricted sort key.
! 160: A
! 161: .Fl b
! 162: option specified before the first
! 163: .Fl k
! 164: option applies globally to all
! 165: .Fl k
! 166: options.
! 167: Otherwise, the
! 168: .Fl b
! 169: option can be
! 170: attached independently to each
! 171: .Ar field
! 172: argument of the
! 173: .Fl k
! 174: option (see below).
! 175: Note that the
! 176: .Fl b
! 177: option
! 178: has no effect unless key fields are specified.
! 179: .It Fl t Ar char
! 180: .Ar Char
! 181: is used as the field separator character. The initial
! 182: .Ar char
! 183: is not considered to be part of a field when determining
! 184: key offsets (see below).
! 185: Each occurrence of
! 186: .Ar char
! 187: is significant (for example,
! 188: .Dq Ar charchar
! 189: delimits an empty field).
! 190: If
! 191: .Fl t
! 192: is not specified,
! 193: blank space characters are used as default field
! 194: separators.
! 195: .It Fl R Ar char
! 196: .Ar Char
! 197: is used as the record separator character.
! 198: This should be used with discretion;
! 199: .Fl R Ar <alphanumeric>
! 200: usually produces undesirable results.
! 201: The default line separator is newline.
! 202: .It Fl k Ar field1[,field2]
! 203: Designates the starting position,
! 204: .Ar field1 ,
! 205: and optional ending position,
! 206: .Ar field2 ,
! 207: of a key field.
! 208: The
! 209: .Fl k
! 210: option replaces the obsolescent options
! 211: .Cm \(pl Ns Ar pos1
! 212: and
! 213: .Fl Ns Ar pos2 .
! 214: .El
! 215: .Pp
! 216: The following operands are available:
! 217: .Bl -tag -width indent
! 218: .Ar file
! 219: The pathname of a file to be sorted, merged, or checked.
! 220: If no file
! 221: operands are specified, or if
! 222: a file operand is
! 223: .Fl ,
! 224: the standard input is used.
! 225: .Pp
! 226: A field is
! 227: defined as a minimal sequence of characters followed by a
! 228: field separator or a newline character.
! 229: By default, the first
! 230: blank space of a sequence of blank spaces acts as the field separator.
! 231: All blank spaces in a sequence of blank spaces are considered
! 232: as part of the next field; for example, all blank spaces at
! 233: the beginning of a line are considered to be part of the
! 234: first field.
! 235: .Pp
! 236: Fields are specified
! 237: by the
! 238: .Fl k Ar field1[,field2]
! 239: argument. A missing
! 240: .Ar field2
! 241: argument defaults to the end of a line.
! 242: .Pp
! 243: The arguments
! 244: .Ar field1
! 245: and
! 246: .Ar field2
! 247: have the form
! 248: .Em m.n
! 249: followed by one or more of the options
! 250: .Fl b , d , f , i ,
! 251: .Fl n , r .
! 252: A
! 253: .Ar field1
! 254: position specified by
! 255: .Em m.n
! 256: .Em (m,n > 0)
! 257: is interpreted as the
! 258: .Em n Ns th
! 259: character in the
! 260: .Em m Ns th
! 261: field.
! 262: A missing
! 263: .Em \&.n
! 264: in
! 265: .Ar field1
! 266: means
! 267: .Ql \&.1 ,
! 268: indicating the first character of the
! 269: .Em m Ns th
! 270: field;
! 271: If the
! 272: .Fl b
! 273: option is in effect,
! 274: .Em n
! 275: is counted from the first
! 276: non-blank character in the
! 277: .Em m Ns th
! 278: field;
! 279: .Em m Ns \&.1b
! 280: refers to the first
! 281: non-blank character in the
! 282: .Em m Ns th
! 283: field.
! 284: .Pp
! 285: A
! 286: .Ar field2
! 287: position specified by
! 288: .Em m.n
! 289: is interpreted as
! 290: the
! 291: .Em n Ns th
! 292: character (including separators) of the
! 293: .Em m Ns th
! 294: field.
! 295: A missing
! 296: .Em \&.n
! 297: indicates the last character of the
! 298: .Em m Ns th
! 299: field;
! 300: .Em m
! 301: = \&0
! 302: designates the end of a line.
! 303: Thus the option
! 304: .Fl k Ar v.x,w.y
! 305: is synonymous with the obsolescent option
! 306: .Cm \(pl Ns Ar v-\&1.x-\&1
! 307: .Fl Ns Ar w-\&1.y ;
! 308: when
! 309: .Em y
! 310: is omitted,
! 311: .Fl k Ar v.x,w
! 312: is synonymous with
! 313: .Cm \(pl Ns Ar v-\&1.x-\&1
! 314: .Fl Ns Ar w+1.0 .
! 315: The obsolescent
! 316: .Cm \(pl Ns Ar pos1
! 317: .Fl Ns Ar pos2
! 318: option is still supported, except for
! 319: .Fl Ns Ar w\&.0b,
! 320: which has no
! 321: .Fl k
! 322: equivalent.
! 323: .Sh ENVIRONMENT
! 324: If the following environment variable exists, it is utilized by
! 325: .Nm sort .
! 326: .Bl -tag -width Fl
! 327: .It Ev TMPDIR
! 328: .Nm Sort
! 329: uses the contents of the
! 330: .Ev TMPDIR
! 331: environment variable as the path in which to store
! 332: temporary files. Note that
! 333: .Ev TMPDIR
! 334: may be overridden by the
! 335: .Fl T
! 336: option.
! 337: .Sh FILES
! 338: .Bl -tag -width Pa -compact
! 339: .It Pa /var/tmp/sort.*
! 340: Default temporary directories.
! 341: .It Pa Ar output Ns #PID
! 342: Temporary name for
! 343: .Ar output
! 344: if
! 345: .Ar output
! 346: already exists.
! 347: .El
! 348: .Sh SEE ALSO
! 349: .Xr comm 1 ,
! 350: .Xr uniq 1 ,
! 351: .Xr join 1
! 352: .Sh RETURN VALUES
! 353: Sort exits with one of the following values:
! 354: .Bl -tag -width flag -compact
! 355: .It Pa 0:
! 356: normal behavior.
! 357: .It Pa 1:
! 358: on disorder (or non-uniqueness) with the
! 359: .Fl c
! 360: option
! 361: .It Pa 2:
! 362: an error occurred.
! 363: .Sh BUGS
! 364: Lines longer than 65522 characters are discarded and processing continues.
! 365: To sort files larger than 60Mb, use
! 366: .Nm sort
! 367: .Fl H ;
! 368: files larger than 704Mb must be sorted in smaller pieces, then merged.
! 369: To protect data
! 370: .Nm sort
! 371: .Fl o
! 372: calls link and unlink, and thus fails in protected directories.
! 373: .Sh HISTORY
! 374: A
! 375: .Nm sort
! 376: command appeared in
! 377: .At v6 .
! 378: .Sh NOTES
! 379: The current sort command uses lexicographic radix sorting, which requires
! 380: that sort keys be kept in memory (as opposed to previous versions which used quick
! 381: and merge sorts and did not.)
! 382: Thus performance depends highly on efficient choice of sort keys, and the
! 383: .Fl b
! 384: option and the
! 385: .Ar field2
! 386: argument of the
! 387: .Fl k
! 388: option should be used whenever possible.
! 389: Similarly,
! 390: .Nm sort
! 391: .Fl k1f
! 392: is equivalent to
! 393: .Nm sort
! 394: .Fl f
! 395: and may take twice as long.