[BACK]Return to sort.1 CVS log [TXT][DIR] Up to [local] / src / usr.bin / sort

Diff for /src/usr.bin/sort/sort.1 between version 1.40 and 1.41

version 1.40, 2013/08/24 22:18:05 version 1.41, 2015/03/17 17:45:13
Line 32 
Line 32 
 .\"  .\"
 .\"     @(#)sort.1      8.1 (Berkeley) 6/6/93  .\"     @(#)sort.1      8.1 (Berkeley) 6/6/93
 .\"  .\"
 .Dd $Mdocdate$  .Dd July 3, 2012
 .Dt SORT 1  .Dt SORT 1
 .Os  .Os
 .Sh NAME  .Sh NAME
 .Nm sort  .Nm sort
 .Nd sort, merge, or sequence check text files  .Nd sort, merge, or sequence check text and binary files
 .Sh SYNOPSIS  .Sh SYNOPSIS
 .Nm sort  .Nm sort
 .Op Fl bCcdfHimnrsuz  .Op Fl bcCdfghiRMmnrsuVz
 .Sm off  .Sm off
 .Op Fl k\ \& Ar field1 Op , Ar field2  .Op Fl k\ \& Ar field1 Op , Ar field2
 .Sm on  .Sm on
 .Op Fl o Ar output  .Op Fl o Ar output
 .Op Fl R Ar char  .Op Fl S Ar memsize
 .Bk -words  .Bk -words
 .Op Fl T Ar dir  .Op Fl T Ar dir
 .Ek  .Ek
Line 54 
Line 54 
 .Sh DESCRIPTION  .Sh DESCRIPTION
 The  The
 .Nm  .Nm
 utility sorts text files by lines,  utility sorts text and binary files by lines.
 operating in one of three modes: sort, merge, or check.  A line is a record separated from the subsequent record by a
 In sort mode, the specified files are combined and sorted  newline (default) or NUL \'\\0\' character (-z option).
 by line.  A record can contain any printable or unprintable characters.
 Merge mode is the same as sort mode except that the input  Comparisons are based on one or more sort keys extracted from
 files are assumed to be pre-sorted.  each line of input, and are performed lexicographically,
 In check mode, a single input file is checked to ensure that  according to the current locale's collating rules and the
 it is correctly sorted.  specified command-line options that can tune the actual
 .Pp  sorting behavior.
 Comparisons are based on one or more sort keys extracted  
 from each line of input, and are performed lexicographically.  
 By default, if keys are not given,  By default, if keys are not given,
 .Nm  .Nm
 regards each input line as a single field.  uses entire lines for comparison.
 .Pp  .Pp
 The options are as follows:  The options are as follows:
 .Bl -tag -width Ds  .Bl -tag -width Ds
 .It Fl C  .It Fl C, Fl Fl check=silent|quiet
 Check that the single input file is sorted.  Check that the single input file is sorted.
 If it is, exit 0; if it's not, exit 1.  If it is, exit 0; if it's not, exit 1.
 In either case, produce no output.  In either case, produce no output.
 .It Fl c  .It Fl c, Fl Fl check
 Like  Like
 .Fl C ,  .Fl C ,
 but additionally write a message to  but additionally write a message to
 .Em stderr  .Em stderr
 if the input file is not sorted.  if the input file is not sorted.
 .It Fl m  .It Fl m , Fl Fl merge
 Merge only; the input files are assumed to be pre-sorted.  Merge only; the input files are assumed to be pre-sorted.
 This option is overridden by the  If they are not sorted, the output order is undefined.
 .Fl C  .It Fl o Ar output , Fl Fl output Ns = Ns Ar output
 or  Write the output to the
 .Fl c  
 options,  
 if they are also present.  
 .It Fl o Ar output  
 The argument given is the name of an  
 .Ar output  .Ar output
 file to be used instead of the standard output.  file instead of the standard output.
 This file can be the same as one of the input files.  This file can be the same as one of the input files.
 .It Fl T Ar dir  .It Fl S Ar size, Fl Fl buffer-size Ns = Ns Ar size
 Use  Use a memory buffer no larger than
 .Ar dir  .Ar size .
 as the directory for temporary files.  The modifiers %, b, K, M, G, T, P, E, Z, and Y can be used.
 The default is the contents of the environment variable  If no memory limit is specified,
   .Nm
   may use up to about 90% of available memory.
   If the input is too big to fit into the memory buffer,
   temporary files are used.
   .It Fl T Ar dir , Fl Fl temporary-directory Ns = Ns Ar dir
   Store temporary files in the directory
   .Ar dir .
   The default path is the value of the environment variable
 .Ev TMPDIR  .Ev TMPDIR
 or  or
 .Pa /var/tmp  .Pa /var/tmp
 if  if
 .Ev TMPDIR  .Ev TMPDIR
 does not exist.  is not defined.
 .It Fl u  .It Fl u , Fl Fl unique
 Unique: suppress all but one in each set of lines having equal keys.  Unique: suppress all but one in each set of lines having equal keys.
 If used with the  This option implies a stable sort (see below).
   If used with
 .Fl C  .Fl C
 or  or
 .Fl c  .Fl c ,
 options, also check that there are no lines with duplicate keys.  .Nm
 .El  also checks that there are no lines with duplicate keys.
 .Pp  
 The following options override the default ordering rules globally:  
 .Bl -tag -width indent  
 .It Fl H  
 Use a merge sort instead of a radix sort.  
 This option should be used for files larger than 60MB.  
 .It Fl s  .It Fl s
 Enable stable sort.  Stable sort; maintains the original record order of records that have
 Uses additional resources (see  and equal key.
 .Xr sradixsort 3 ) .  This is a non-standard feature, but it is widely accepted and used.
   .It Fl Fl version
   Print the version and exit.
   .It Fl Fl help
   Print the help text and exit.
 .El  .El
 .Pp  .Pp
 The following options override the default ordering rules.  The following options override the default ordering rules.
Line 131 
Line 131 
 option, they apply globally to all sort keys.  option, they apply globally to all sort keys.
 When attached to a specific key (see  When attached to a specific key (see
 .Fl k ) ,  .Fl k ) ,
 the ordering options override  the ordering options override all global ordering options for that key.
 all global ordering options for that key.  
 Note that the ordering options intended to apply globally should not  Note that the ordering options intended to apply globally should not
 appear after  appear after
 .Fl k  .Fl k
 or results may be unexpected.  or results may be unexpected.
 .Bl -tag -width indent  .Bl -tag -width indent
 .It Fl d  .It Fl b, Fl Fl ignore-leading-blanks
 Only blank space and alphanumeric characters  Ignore leading blank characters when comparing lines.
 .\" according  .It Fl d , Fl Fl dictionary-order
 .\" to the current setting of LC_CTYPE  Consider only blank spaces and alphanumeric characters in comparisons.
 are used in making comparisons.  .It Fl f , Fl Fl ignore-case
 .It Fl f  Consider all lowercase characters that have uppercase
 Considers all lowercase characters that have uppercase  
 equivalents to be the same for purposes of comparison.  equivalents to be the same for purposes of comparison.
 .It Fl i  .It Fl g, Fl Fl general-numeric-sort, Fl Fl sort=general-numeric
   Sort by general numerical value.
   As opposed to
   .Fl n ,
   this option handles general floating points, which have a much
   permissive format than those allowed by
   . Fl n ,
   but it has a significant performance drawback.
   .It Fl h, Fl Fl human-numeric-sort, Fl Fl sort=human-numeric
   Sort by numerical value, but take into account the SI suffix,
   if present.
   Sorts first by numeric sign (negative, zero, or
   positive); then by SI suffix (either empty, or `k' or `K', or one
   of `MGTPEZY', in that order); and finally by numeric value.
   The SI suffix must immediately follow the number.
   For example, '12345K' sorts before '1M', because M is "larger" than K.
   This sort option is useful for sorting the output of a single invocation
   of 'df' command with
   .Fl h
   or
   .Fl H
   options (human-readable).
   .It Fl i , Fl Fl ignore-nonprinting
 Ignore all non-printable characters.  Ignore all non-printable characters.
 .It Fl n  .It Fl M, Fl Fl month-sort, Fl Fl sort=month
   Sort by month abbreviations.
   Unknown strings are considered smaller than valid month names.
   .It Fl n , Fl Fl numeric-sort, Fl Fl sort=numeric
 An initial numeric string, consisting of optional blank space, optional  An initial numeric string, consisting of optional blank space, optional
 minus sign, and zero or more digits (including decimal point)  minus sign, and zero or more digits (including decimal point)
 .\" with  .\" with
Line 156 
Line 179 
 .\" separator  .\" separator
 .\" (as defined in the current locale),  .\" (as defined in the current locale),
 is sorted by arithmetic value.  is sorted by arithmetic value.
 (The  Leading blank characters are ignored.
 .Fl n  .It Fl R, Fl Fl random-sort, Fl Fl sort=random
 option no longer implies the  Sort lines in random order.
 .Fl b  This is a random permutation of the inputs with the exception that
 option.)  equal keys sort together.
 .It Fl r  It is implemented by hashing the input keys and sorting the hash values.
 Reverse the sense of comparisons.  The hash function is randomized with data from
   .Fn arc4random_buf ,
   or by file content if one is specified via
   .Fl Fl random-source .
   If multiple sort fields are specified,
   the same random hash function is used for all of them.
   .It Fl r , Fl Fl reverse
   Sort in reverse order.
   .It Fl V, Fl Fl version-sort
   Sort version numbers.
   The input lines are treated as file names in form
   PREFIX VERSION SUFFIX, where SUFFIX matches the regular expression
   "(\.([A-Za-z~][A-Za-z0-9~]*)?)*".
   The files are compared by their prefixes and versions (leading
   zeros are ignored in version numbers, see example below).
   If an input string does not match the pattern, then it is compared
   using the byte compare function.
   All string comparisons are performed in the C locale.
   .Bl -tag -width indent
   .It Example:
   .It $ ls sort* | sort -V
   .It sort-1.022.tgz
   .It sort-1.23.tgz
   .It sort-1.23.1.tgz
   .It sort-1.024.tgz
   .It sort-1.024.003.
   .It sort-1.024.003.tgz
   .It sort-1.024.07.tgz
   .It sort-1.024.009.tgz
 .El  .El
   .El
 .Pp  .Pp
 The treatment of field separators can be altered using these options:  The treatment of field separators can be altered using these options:
 .Bl -tag -width indent  .Bl -tag -width indent
 .It Fl b  .It Fl b , Fl Fl ignore-leading-blanks
 Ignores leading blank space when determining the start  Ignore leading blank space when determining the start
 and end of a restricted sort key.  and end of a restricted sort key (see
 A  .Fl k ) .
   If
 .Fl b  .Fl b
 option specified before the first  is specified before the first
 .Fl k  .Fl k
 option applies globally to all  option, it applies globally to all key specifications.
 .Fl k  Otherwise,
 options.  
 Otherwise, the  
 .Fl b  .Fl b
 option can be attached independently to each  can be attached independently to each
 .Ar field  .Ar field
 argument of the  argument of the key specifications.
   .It Xo
   .Sm off
   .Fl k\ \& Ar field1 Op , Ar field2 , Fl Fl key Ns = Ns Ar field1 Op , Ar field2
   .Sm on
   .Xc
   Define a restricted sort key that has the starting position
   .Ar field1 ,
   and optional ending position
   .Ar field2
   of a key field.
   The
 .Fl k  .Fl k
 option (see below).  option may be specified multiple times,
 Note that  in which case subsequent keys are compared after earlier keys compare equal.
 .Fl b  The
 should not appear after  .Fl k
 .Fl k ,  option replaces the obsolete options
 and that it has no effect unless key fields are specified.  .Cm \(pl Ns Ar pos1
 .It Fl R Ar char  and
   .Fl Ns Ar pos2 ,
   but the old notation is also supported.
   .It Fl t Ar char , Fl Fl field-separator Ns = Ns Ar char
   Use
 .Ar char  .Ar char
 is used as the record separator character.  as the field separator character.
 This should be used with discretion;  
 .Fl R Aq Ar alphanumeric  
 usually produces undesirable results.  
 The default record separator is newline.  
 .It Fl t Ar char  
 .Ar char  
 is used as the field separator character.  
 The initial  The initial
 .Ar char  .Ar char
 is not considered to be part of a field when determining key offsets.  is not considered to be part of a field when determining key offsets.
Line 215 
Line 274 
 delimit an empty field; further, the initial blank space  delimit an empty field; further, the initial blank space
 .Em is  .Em is
 considered part of a field when determining key offsets.  considered part of a field when determining key offsets.
 .It Fl z  To use NUL as field separator, use
 Uses the nul character as the record separator.  .Fl t
   \'\\0\'.
   .It Fl z , Fl Fl zero-terminated
   Use NUL as the record separator.
   By default, records in the files are expected to be separated by
   the newline characters.
   With this option, NUL (\'\\0\') is used as the record separator character.
 .El  .El
 .Pp  .Pp
 Sort keys are specified with:  Other options:
 .Bl -tag -width indent  .Bl -tag -width indent
 .It Xo  .It Fl Fl batch-size Ns = Ns Ar num
 .Sm off  Specify maximum number of files that can be opened by
 .Fl k\ \& Ar field1 Op , Ar field2  .Nm
 .Sm on  at once.
 .Xc  This option affects behavior when having many input files or using
 Designates the starting position,  temporary files.
 .Ar field1 ,  The default value is 16.
 and optional ending position,  .It Fl Fl compress-program Ns = Ns Ar program
 .Ar field2 ,  Use
 of a key field.  .Ar program
   to compress temporary files.
   When invoked with no arguments,
   .Ar program
   must compress standard input to standard output.
   When called with the
   .Fl d
   option, it must decompress standard input to standard output.
   If
   .Ar program
   fails,
   .Nm
   will exit with an error.
 The  The
 .Fl k  .Xr compress 1
 option may be specified multiple times,  
 in which case subsequent keys are compared after earlier keys compare equal.  
 The  
 .Fl k  
 option replaces the obsolescent options  
 .Cm \(pl Ns Ar pos1  
 and  and
 .Fl Ns Ar pos2 .  .Xr gzip 1
   utilities meet these requirements.
   .It Fl Fl random-source Ns = Ns Ar filename
   For random sort, the contents of
   .Ar filename
   are used as the source of the
   .Sq seed
   data for the hash function.
   Two invocations of random sort with the same seed data will use
   produce the same result if the input is also identical.
   By default, the
   .Fn arc4random_buf
   function is used instead.
   .It Fl Fl debug
   Print some extra information about the sorting process to the
   standard output.
   .It Fl Fl files0-from Ns = Ns Ar filename
   Take the input file list from the file
   .Ar filename.
   The file names must be separated by NUL
   (like the output produced by the command
   .Dq find ... -print0 ) .
   .It Fl Fl radixsort
   Try to use radix sort, if the sort specifications allow.
   The radix sort can only be used for trivial locales (C and POSIX),
   and it cannot be used for numeric or month sort.
   Radix sort is very fast and stable.
   .It Fl H, Fl Fl mergesort
   Use mergesort.
   This is a universal algorithm that can always be used,
   but it is not always the fastest.
   .It Fl Fl qsort
   Try to use quick sort, if the sort specifications allow.
   This sort algorithm cannot be used with
   .Fl u
   and
   .Fl s .
   .It Fl Fl heapsort
   Try to use heap sort, if the sort specifications allow.
   This sort algorithm cannot be used with
   .Fl u
   and
   .Fl s .
   .It Fl Fl mmap
   Try to use file memory mapping system call.
   It may increase speed in some cases.
 .El  .El
 .Pp  .Pp
 The following operands are available:  The following operands are available:
Line 273 
Line 389 
 .Sm off  .Sm off
 .Fl k\ \& Ar field1 Op , Ar field2  .Fl k\ \& Ar field1 Op , Ar field2
 .Sm on  .Sm on
 argument.  option.
 A missing  If
 .Ar field2  .Ar field2
 argument defaults to the end of a line.  is missing, the end of the key defaults to the end of the line.
 .Pp  .Pp
 The arguments  The arguments
 .Ar field1  .Ar field1
Line 285 
Line 401 
 have the form  have the form
 .Em m.n  .Em m.n
 .Em (m,n > 0)  .Em (m,n > 0)
 and can be followed by one or more of the letters  and can be followed by one or more of the modifiers
 .Cm b , d , f , i ,  .Cm b , d , f , i ,
 .Cm n ,  .Cm n , g , M
 and  and
 .Cm r ,  .Cm r ,
 which correspond to the options discussed above.  which correspond to the options discussed above.
   When
   .Cm b
   is specified it applies only to
   .Ar field1
   or
   .Ar field2
   where it is specified while the rest of the modifiers
   apply to the whole key field regardless if they are
   specified only with
   .Ar field1
   or
   .Ar field2
   or both.
 A  A
 .Ar field1  .Ar field1
 position specified by  position specified by
Line 327 
Line 456 
 .Em n  .Em n
 is greater than the length of the line, the field is taken to be empty.  is greater than the length of the line, the field is taken to be empty.
 .Pp  .Pp
   .Em n Ns th
   positions are always counted from the field beginning, even if the field
   is shorter than the number of specified positions.
   Thus, the key can really start from a position in a subsequent field.
   .Pp
 A  A
 .Ar field2  .Ar field2
 position specified by  position specified by
 .Em m.n  .Em m.n
 is interpreted as the  is interpreted as the
 .Em n Ns th  .Em n Ns th
 character (including separators) of the  character (including separators) from the beginning of the
 .Em m Ns th  .Em m Ns th
 field.  field.
 A missing  A missing
Line 346 
Line 480 
 designates the end of a line.  designates the end of a line.
 Thus the option  Thus the option
 .Fl k Ar v.x,w.y  .Fl k Ar v.x,w.y
 is synonymous with the obsolescent option  is synonymous with the obsolete option
 .Cm \(pl Ns Ar v-\&1.x-\&1  .Cm \(pl Ns Ar v-\&1.x-\&1
 .Fl Ns Ar w-\&1.y ;  .Fl Ns Ar w-\&1.y ;
 when  when
Line 356 
Line 490 
 is synonymous with  is synonymous with
 .Cm \(pl Ns Ar v-\&1.x-\&1  .Cm \(pl Ns Ar v-\&1.x-\&1
 .Fl Ns Ar w\&.0 .  .Fl Ns Ar w\&.0 .
 The obsolescent  The obsolete
 .Cm \(pl Ns Ar pos1  .Cm \(pl Ns Ar pos1
 .Fl Ns Ar pos2  .Fl Ns Ar pos2
 option is still supported, except for  option is still supported, except for
Line 366 
Line 500 
 equivalent.  equivalent.
 .Sh ENVIRONMENT  .Sh ENVIRONMENT
 .Bl -tag -width Fl  .Bl -tag -width Fl
   .It Ev LC_COLLATE
   Locale settings to be used to determine the collation for
   sorting records.
   .It Ev LC_CTYPE
   Locale settings to be used to case conversion and classification
   of characters, that is, which characters are considered
   whitespaces, etc.
   .It Ev LC_MESSAGES
   Locale settings that determine the language of output messages
   that
   .Nm
   prints out.
   .It Ev LC_NUMERIC
   Locale settings that determine the number format used in numeric sort.
   .It Ev LC_TIME
   Locale settings that determine the month format used in month sort.
   .It Ev LC_ALL
   Locale settings that override all of the above locale settings.
   This environment variable can be used to set all these settings
   to the same value at once.
   .It Ev LANG
   Used as a last resort to determine different kinds of locale-specific
   behavior if neither the respective environment variable, nor
   .Ev LC_ALL
   are set.
 .It Ev TMPDIR  .It Ev TMPDIR
 Path in which to store temporary files.  Path to the directory in which temporary files will be stored.
 Note that  Note that
 .Ev TMPDIR  .Ev TMPDIR
 may be overridden by the  may be overridden by the
 .Fl T  .Fl T
 option.  option.
   .It Ev GNUSORT_NUMERIC_COMPATIBILITY
   If defined
   .Fl t
   will not override the locale numeric symbols, that is, thousand
   separators and decimal separators.
   By default, if we specify
   .Fl t
   with the same symbol as the thousand separator or decimal point,
   the symbol will be treated as the field separator.
   Older behavior was less definite; the symbol was treated as both field
   separator and numeric separator, simultaneously.
   This environment variable enables the old behavior.
 .El  .El
 .Sh FILES  .Sh FILES
 .Bl -tag -width Pa -compact  .Bl -tag -width Pa -compact
 .It Pa /var/tmp/sort.*  .It Pa /var/tmp/.bsdsort.PID.*
 default temporary directories  Temporary files.
 .It Pa output Ns #PID  
 temporary name for  
 .Ar output  
 if  
 .Ar output  
 already exists  
 .El  .El
 .Sh EXIT STATUS  .Sh EXIT STATUS
 The  The
Line 392 
Line 557 
 .Pp  .Pp
 .Bl -tag -width Ds -offset indent -compact  .Bl -tag -width Ds -offset indent -compact
 .It 0  .It 0
 Normal behavior.  Successfully sorted the input files or if used with
   .Fl C
   or
   .Fl c ,
   the input file already met the sorting criteria.
 .It 1  .It 1
 The input file is not sorted and  On disorder (or non-uniqueness) with the
 .Fl C  .Fl C
 or  or
 .Fl c  .Fl c
 was given, or there are duplicate keys and  options.
 .Fl Cu  
 or  
 .Fl cu  
 was given.  
 .It 2  .It 2
 An error occurred.  An error occurred.
 .El  .El
Line 410 
Line 575 
 .Xr comm 1 ,  .Xr comm 1 ,
 .Xr join 1 ,  .Xr join 1 ,
 .Xr uniq 1 ,  .Xr uniq 1 ,
 .Xr radixsort 3  .Xr arc4random_buf 3
 .Sh STANDARDS  .Sh STANDARDS
 The  The
 .Nm  .Nm
Line 419 
Line 584 
 specification.  specification.
 .Pp  .Pp
 The flags  The flags
 .Op Fl HRsTz  .Op Fl ghRMSsTVz
 are extensions to that specification.  are extensions to that specification.
   .Pp
   All long options are extensions to the specification.
   Some are provided for compatibility with GNU
   .Nm ,
   others are specific to this implementation.
   .Pp
   The historic key notations
   .Cm \(pl Ns Ar pos1
   and
   .Fl Ns Ar pos2
   are supported for compatibility with older versions of
   .Nm
   but their use is highly discouraged.
 .Sh HISTORY  .Sh HISTORY
 A  A
 .Nm  .Nm
 command appeared in  command appeared in
 .At v3 .  .At v3 .
   .Sh AUTHORS
   Gabor Kovesdan <gabor@FreeBSD.org>
   .br
   Oleg Moskalenko <mom040267@gmail.com>
 .Sh NOTES  .Sh NOTES
   This implementation of
 .Nm  .Nm
 has no limits on input line length (other than imposed by available  has no limits on input line length (other than imposed by available
 memory) or any restrictions on bytes allowed within lines.  memory) or any restrictions on bytes allowed within lines.
 .Pp  .Pp
 To protect data  The performance depends highly on locale settings,
 .Nm  efficient choice of sort keys and key complexity.
 .Fl o  The fastest sort is with the C locale, on whole lines, with option
 calls  .Fl s .
 .Xr link 2  In general, the C locale is the fastest, followed by single-byte
 and  locales with multi-byte locales being the slowest.
 .Xr unlink 2 ,  The correct collation order respected in all cases.
 and thus fails on protected directories.  For the key specification, the simpler to process the
   lines the faster the search will be.
 .Pp  .Pp
 The current sort command uses lexicographic radix sorting, which requires  When sorting by arithmetic value, using
 that sort keys be kept in memory (as opposed to previous versions which  .Fl n
 used quick and merge sorts and did not).  results in much better performance than
 Thus performance depends highly on efficient choice of sort keys, and the  .Fl g
 .Fl b  so its use is encouraged whenever possible.
 option and the  
 .Ar field2  
 argument of the  
 .Fl k  
 option should be used whenever possible.  
 Similarly,  
 .Nm  
 .Fl k1f  
 is equivalent to  
 .Nm  
 .Fl f  
 and may take twice as long.  
 .Sh BUGS  
 To sort files larger than 60MB, use  
 .Nm  
 .Fl H ;  
 files larger than 704MB must be sorted in smaller pieces, then merged.  

Legend:
Removed from v.1.40  
changed lines
  Added in v.1.41