[BACK]Return to awk.1 CVS log [TXT][DIR] Up to [local] / src / usr.bin / awk

Annotation of src/usr.bin/awk/awk.1, Revision 1.65

1.65    ! millert     1: .\"    $OpenBSD: awk.1,v 1.64 2023/09/15 15:07:08 jsg Exp $
1.11      jmc         2: .\"
                      3: .\" Copyright (C) Lucent Technologies 1997
                      4: .\" All Rights Reserved
1.12      jmc         5: .\"
1.11      jmc         6: .\" Permission to use, copy, modify, and distribute this software and
                      7: .\" its documentation for any purpose and without fee is hereby
                      8: .\" granted, provided that the above copyright notice appear in all
                      9: .\" copies and that both that the copyright notice and this
                     10: .\" permission notice and warranty disclaimer appear in supporting
                     11: .\" documentation, and that the name Lucent Technologies or any of
                     12: .\" its entities not be used in advertising or publicity pertaining
                     13: .\" to distribution of the software without specific, written prior
                     14: .\" permission.
1.12      jmc        15: .\"
1.11      jmc        16: .\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
                     17: .\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
                     18: .\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
                     19: .\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
                     20: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
                     21: .\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
                     22: .\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
                     23: .\" THIS SOFTWARE.
                     24: .\"
1.65    ! millert    25: .Dd $Mdocdate: September 15 2023 $
1.7       aaron      26: .Dt AWK 1
                     27: .Os
                     28: .Sh NAME
                     29: .Nm awk
                     30: .Nd pattern-directed scanning and processing language
                     31: .Sh SYNOPSIS
                     32: .Nm awk
1.16      jmc        33: .Op Fl safe
                     34: .Op Fl V
                     35: .Op Fl d Ns Op Ar n
1.65    ! millert    36: .Op Fl F Ar fs | Fl -csv
1.38      schwarze   37: .Op Fl v Ar var Ns = Ns Ar value
1.18      jmc        38: .Op Ar prog | Fl f Ar progfile
1.7       aaron      39: .Ar
                     40: .Sh DESCRIPTION
                     41: .Nm
1.1       tholo      42: scans each input
1.7       aaron      43: .Ar file
1.1       tholo      44: for lines that match any of a set of patterns specified literally in
1.7       aaron      45: .Ar prog
1.16      jmc        46: or in one or more files specified as
1.7       aaron      47: .Fl f Ar progfile .
1.16      jmc        48: With each pattern there can be an associated action that will be performed
1.1       tholo      49: when a line of a
1.7       aaron      50: .Ar file
1.1       tholo      51: matches the pattern.
                     52: Each line is matched against the
                     53: pattern portion of every pattern-action statement;
                     54: the associated action is performed for each matched pattern.
1.6       aaron      55: The file name
1.16      jmc        56: .Sq -
1.1       tholo      57: means the standard input.
                     58: Any
1.7       aaron      59: .Ar file
1.1       tholo      60: of the form
1.16      jmc        61: .Ar var Ns = Ns Ar value
1.1       tholo      62: is treated as an assignment, not a filename,
                     63: and is executed at the time it would have been opened if it were a filename.
1.16      jmc        64: .Pp
                     65: The options are as follows:
1.20      jmc        66: .Bl -tag -width "-safe "
1.65    ! millert    67: .It Fl -csv
        !            68: Process records using the (more or less) standard comma-separated values
        !            69: .Pq CSV
        !            70: format instead of the input field separator.
        !            71: When the
        !            72: .Fl -csv
        !            73: option is specified, attempts to change the input field separator
        !            74: or record separator are ignored.
1.16      jmc        75: .It Fl d Ns Op Ar n
                     76: Debug mode.
                     77: Set debug level to
                     78: .Ar n ,
                     79: or 1 if
                     80: .Ar n
                     81: is not specified.
                     82: A value greater than 1 causes
                     83: .Nm
                     84: to dump core on fatal errors.
                     85: .It Fl F Ar fs
                     86: Define the input field separator to be the regular expression
1.7       aaron      87: .Ar fs .
1.25      jmc        88: .It Fl f Ar progfile
1.16      jmc        89: Read program code from the specified file
1.25      jmc        90: .Ar progfile
1.16      jmc        91: instead of from the command line.
                     92: .It Fl safe
                     93: Disable file output
1.17      jmc        94: .Pf ( Ic print No > ,
                     95: .Ic print No >> ) ,
1.7       aaron      96: process creation
                     97: .Po
1.17      jmc        98: .Ar cmd | Ic getline ,
1.40      jmc        99: .Ic print | ,
1.17      jmc       100: .Ic system
1.7       aaron     101: .Pc
                    102: and access to the environment
1.17      jmc       103: .Pf ( Va ENVIRON ;
1.18      jmc       104: see the section on variables below).
1.17      jmc       105: This is a first
1.16      jmc       106: .Pq and not very reliable
                    107: approximation to a
1.7       aaron     108: .Dq safe
                    109: version of
1.16      jmc       110: .Nm .
                    111: .It Fl V
                    112: Print the version number of
                    113: .Nm
                    114: to standard output and exit.
                    115: .It Fl v Ar var Ns = Ns Ar value
                    116: Assign
                    117: .Ar value
                    118: to variable
                    119: .Ar var
                    120: before
                    121: .Ar prog
                    122: is executed;
                    123: any number of
                    124: .Fl v
                    125: options may be present.
                    126: .El
1.7       aaron     127: .Pp
1.18      jmc       128: The input is normally made up of input lines
                    129: .Pq records
                    130: separated by newlines, or by the value of
                    131: .Va RS .
                    132: If
                    133: .Va RS
                    134: is null, then any number of blank lines are used as the record separator,
                    135: and newlines are used as field separators
                    136: (in addition to the value of
                    137: .Va FS ) .
                    138: This is convenient when working with multi-line records.
                    139: .Pp
1.7       aaron     140: An input line is normally made up of fields separated by whitespace,
1.55      millert   141: or by the value of the field separator
                    142: .Va FS
                    143: at the time the line is read.
1.1       tholo     144: The fields are denoted
1.7       aaron     145: .Va $1 , $2 , ... ,
                    146: while
                    147: .Va $0
1.1       tholo     148: refers to the entire line.
1.55      millert   149: .Va FS
                    150: may be set to either a single character or a regular expression.
1.58      jmc       151: As a special case, if
1.55      millert   152: .Va FS
                    153: is a single space
                    154: .Pq the default ,
                    155: fields will be split by one or more whitespace characters.
1.1       tholo     156: If
1.7       aaron     157: .Va FS
1.1       tholo     158: is null, the input line is split into one field per character.
1.7       aaron     159: .Pp
1.18      jmc       160: Normally, any number of blanks separate fields.
                    161: In order to set the field separator to a single blank, use the
                    162: .Fl F
                    163: option with a value of
                    164: .Sq [\ \&] .
                    165: If a field separator of
                    166: .Sq t
                    167: is specified,
                    168: .Nm
                    169: treats it as if
                    170: .Sq \et
                    171: had been specified and uses
                    172: .Aq TAB
                    173: as the field separator.
                    174: In order to use a literal
                    175: .Sq t
                    176: as the field separator, use the
                    177: .Fl F
                    178: option with a value of
                    179: .Sq [t] .
1.55      millert   180: The field separator is usually set via the
                    181: .Fl F
                    182: option or from inside a
                    183: .Ic BEGIN
                    184: block so that it takes effect before the input is read.
1.18      jmc       185: .Pp
1.47      millert   186: A pattern-action statement has the form:
1.7       aaron     187: .Pp
                    188: .D1 Ar pattern Ic \&{ Ar action Ic \&}
                    189: .Pp
1.6       aaron     190: A missing
1.7       aaron     191: .Ic \&{ Ar action Ic \&}
1.1       tholo     192: means print the line;
                    193: a missing pattern always matches.
                    194: Pattern-action statements are separated by newlines or semicolons.
1.7       aaron     195: .Pp
1.18      jmc       196: Newlines are permitted after a terminating statement or following a comma
                    197: .Pq Sq ,\& ,
                    198: an open brace
                    199: .Pq Sq { ,
                    200: a logical AND
                    201: .Pq Sq && ,
                    202: a logical OR
                    203: .Pq Sq || ,
                    204: after the
                    205: .Sq do
                    206: or
                    207: .Sq else
                    208: keywords,
                    209: or after the closing parenthesis of an
                    210: .Sq if ,
                    211: .Sq for ,
                    212: or
                    213: .Sq while
                    214: statement.
                    215: Additionally, a backslash
                    216: .Pq Sq \e
                    217: can be used to escape a newline between tokens.
                    218: .Pp
1.1       tholo     219: An action is a sequence of statements.
                    220: A statement can be one of the following:
1.35      jmc       221: .Pp
                    222: .Bl -tag -width Ds -offset indent -compact
1.43      schwarze  223: .It Ic if Ar ( expression ) Ar statement Op Ic else Ar statement
                    224: .It Ic while Ar ( expression ) Ar statement
                    225: .It Ic for Ar ( expression ; expression ; expression ) statement
                    226: .It Ic for Ar ( var Ic in Ar array ) statement
                    227: .It Ic do Ar statement Ic while Ar ( expression )
1.35      jmc       228: .It Ic break
                    229: .It Ic continue
                    230: .It Xo Ic {
                    231: .Op Ar statement ...
                    232: .Ic }
                    233: .Xc
                    234: .It Xo Ar expression
                    235: .No # commonly
                    236: .Ar var No = Ar expression
1.7       aaron     237: .Xc
1.35      jmc       238: .It Xo Ic print
1.7       aaron     239: .Op Ar expression-list
1.17      jmc       240: .Op > Ns Ar expression
1.7       aaron     241: .Xc
1.35      jmc       242: .It Xo Ic printf Ar format
1.7       aaron     243: .Op Ar ... , expression-list
1.17      jmc       244: .Op > Ns Ar expression
1.7       aaron     245: .Xc
1.35      jmc       246: .It Ic return Op Ar expression
                    247: .It Xo Ic next
                    248: .No # skip remaining patterns on this input line
                    249: .Xc
                    250: .It Xo Ic nextfile
                    251: .No # skip rest of this file, open next, start at top
                    252: .Xc
                    253: .It Xo Ic delete
                    254: .Sm off
                    255: .Ar array Ic \&[ Ar expression Ic \&]
                    256: .Sm on
                    257: .No # delete an array element
1.7       aaron     258: .Xc
1.35      jmc       259: .It Xo Ic delete Ar array
                    260: .No # delete all elements of array
1.7       aaron     261: .Xc
1.35      jmc       262: .It Xo Ic exit
1.7       aaron     263: .Op Ar expression
1.46      deraadt   264: .No # exit processing, and perform
                    265: .Ic END
                    266: processing; status is
                    267: .Ar expression
1.7       aaron     268: .Xc
1.35      jmc       269: .El
1.7       aaron     270: .Pp
1.1       tholo     271: Statements are terminated by
                    272: semicolons, newlines or right braces.
                    273: An empty
1.7       aaron     274: .Ar expression-list
1.1       tholo     275: stands for
1.7       aaron     276: .Ar $0 .
                    277: String constants are quoted
                    278: .Li \&"" ,
1.20      jmc       279: with the usual C escapes recognized within
                    280: (see
                    281: .Xr printf 1
                    282: for a complete list of these).
1.1       tholo     283: Expressions take on string or numeric values as appropriate,
                    284: and are built using the operators
1.7       aaron     285: .Ic + \- * / % ^
1.20      jmc       286: .Pq exponentiation ,
                    287: and concatenation
                    288: .Pq indicated by whitespace .
1.1       tholo     289: The operators
1.16      jmc       290: .Ic \&! ++ \-\- += \-= *= /= %= ^=
1.59      millert   291: .Ic > >= < <= == != ?\&:
1.1       tholo     292: are also available in expressions.
                    293: Variables may be scalars, array elements
                    294: (denoted
1.7       aaron     295: .Li x[i] )
1.1       tholo     296: or fields.
                    297: Variables are initialized to the null string.
                    298: Array subscripts may be any string,
                    299: not necessarily numeric;
                    300: this allows for a form of associative memory.
                    301: Multiple subscripts such as
1.7       aaron     302: .Li [i,j,k]
1.1       tholo     303: are permitted; the constituents are concatenated,
                    304: separated by the value of
1.17      jmc       305: .Va SUBSEP
1.31      deraadt   306: .Pq see the section on variables below .
1.7       aaron     307: .Pp
1.1       tholo     308: The
1.7       aaron     309: .Ic print
1.1       tholo     310: statement prints its arguments on the standard output
                    311: (or on a file if
1.47      millert   312: .Pf >\ \& Ar file
1.1       tholo     313: or
1.47      millert   314: .Pf >>\ \& Ar file
1.1       tholo     315: is present or on a pipe if
1.17      jmc       316: .Pf |\ \& Ar cmd
1.1       tholo     317: is present), separated by the current output field separator,
                    318: and terminated by the output record separator.
1.7       aaron     319: .Ar file
1.1       tholo     320: and
1.7       aaron     321: .Ar cmd
1.1       tholo     322: may be literal names or parenthesized expressions;
                    323: identical string values in different statements denote
                    324: the same open file.
                    325: The
1.7       aaron     326: .Ic printf
1.47      millert   327: statement formats its expression list according to the
                    328: .Ar format
1.1       tholo     329: (see
1.28      jmc       330: .Xr printf 1 ) .
1.18      jmc       331: .Pp
                    332: Patterns are arbitrary Boolean combinations
                    333: (with
                    334: .Ic "\&! || &&" )
                    335: of regular expressions and
                    336: relational expressions.
1.22      jmc       337: .Nm
                    338: supports extended regular expressions
                    339: .Pq EREs .
                    340: See
                    341: .Xr re_format 7
                    342: for more information on regular expressions.
1.18      jmc       343: Isolated regular expressions
                    344: in a pattern apply to the entire line.
                    345: Regular expressions may also occur in
                    346: relational expressions, using the operators
                    347: .Ic ~
                    348: and
                    349: .Ic !~ .
1.44      schwarze  350: .Pf / Ar re Ns /
1.18      jmc       351: is a constant regular expression;
                    352: any string (constant or variable) may be used
                    353: as a regular expression, except in the position of an isolated regular expression
                    354: in a pattern.
                    355: .Pp
                    356: A pattern may consist of two patterns separated by a comma;
                    357: in this case, the action is performed for all lines
                    358: from an occurrence of the first pattern
                    359: through an occurrence of the second.
                    360: .Pp
                    361: A relational expression is one of the following:
1.35      jmc       362: .Pp
                    363: .Bl -tag -width Ds -offset indent -compact
                    364: .It Ar expression matchop regular-expression
                    365: .It Ar expression relop expression
                    366: .It Ar expression Ic in Ar array-name
                    367: .It Xo Ic \&( Ns
1.18      jmc       368: .Ar expr , expr , \&... Ns Ic \&) in
1.35      jmc       369: .Ar array-name
1.18      jmc       370: .Xc
1.35      jmc       371: .El
1.18      jmc       372: .Pp
                    373: where a
                    374: .Ar relop
                    375: is any of the six relational operators in C, and a
                    376: .Ar matchop
                    377: is either
                    378: .Ic ~
                    379: (matches)
                    380: or
                    381: .Ic !~
                    382: (does not match).
                    383: A conditional is an arithmetic expression,
                    384: a relational expression,
                    385: or a Boolean combination
                    386: of these.
                    387: .Pp
1.46      deraadt   388: The special pattern
1.18      jmc       389: .Ic BEGIN
1.46      deraadt   390: may be used to capture control before the first input line is read.
                    391: The special pattern
1.18      jmc       392: .Ic END
1.46      deraadt   393: may be used to capture control after processing is finished.
1.18      jmc       394: .Ic BEGIN
                    395: and
                    396: .Ic END
                    397: do not combine with other patterns.
1.47      millert   398: They may appear multiple times in a program and execute
                    399: in the order they are read by
                    400: .Nm .
1.18      jmc       401: .Pp
                    402: Variable names with special meanings:
                    403: .Pp
1.20      jmc       404: .Bl -tag -width "FILENAME " -compact
1.18      jmc       405: .It Va ARGC
                    406: Argument count, assignable.
                    407: .It Va ARGV
                    408: Argument array, assignable;
                    409: non-null members are taken as filenames.
                    410: .It Va CONVFMT
                    411: Conversion format when converting numbers
                    412: (default
                    413: .Qq Li %.6g ) .
                    414: .It Va ENVIRON
                    415: Array of environment variables; subscripts are names.
                    416: .It Va FILENAME
                    417: The name of the current input file.
                    418: .It Va FNR
                    419: Ordinal number of the current record in the current file.
                    420: .It Va FS
1.55      millert   421: Regular expression used to separate fields (default whitespace);
                    422: also settable by option
1.63      jmc       423: .Fl F Ar fs .
1.18      jmc       424: .It Va NF
                    425: Number of fields in the current record.
                    426: .Va $NF
                    427: can be used to obtain the value of the last field in the current record.
                    428: .It Va NR
                    429: Ordinal number of the current record.
                    430: .It Va OFMT
                    431: Output format for numbers (default
                    432: .Qq Li %.6g ) .
                    433: .It Va OFS
                    434: Output field separator (default blank).
                    435: .It Va ORS
                    436: Output record separator (default newline).
                    437: .It Va RLENGTH
                    438: The length of the string matched by the
                    439: .Fn match
                    440: function.
                    441: .It Va RS
                    442: Input record separator (default newline).
1.49      millert   443: If empty, blank lines separate records.
                    444: If more than one character long,
                    445: .Va RS
                    446: is treated as a regular expression, and records are
                    447: separated by text matching the expression.
1.18      jmc       448: .It Va RSTART
                    449: The starting position of the string matched by the
                    450: .Fn match
                    451: function.
                    452: .It Va SUBSEP
                    453: Separates multiple subscripts (default 034).
                    454: .El
1.17      jmc       455: .Sh FUNCTIONS
                    456: The awk language has a variety of built-in functions:
1.30      jmc       457: arithmetic, string, input/output, general, and bit-operation.
                    458: .Pp
                    459: Functions may be defined (at the position of a pattern-action statement)
                    460: thusly:
                    461: .Pp
                    462: .Dl function foo(a, b, c) { ...; return x }
                    463: .Pp
                    464: Parameters are passed by value if scalar, and by reference if array name;
                    465: functions may be called recursively.
                    466: Parameters are local to the function; all other variables are global.
                    467: Thus local variables may be created by providing excess parameters in
                    468: the function definition.
1.17      jmc       469: .Ss Arithmetic Functions
                    470: .Bl -tag -width "atan2(y, x)"
                    471: .It Fn atan2 y x
                    472: Return the arctangent of
                    473: .Fa y Ns / Ns Fa x
                    474: in radians.
                    475: .It Fn cos x
                    476: Return the cosine of
                    477: .Fa x ,
                    478: where
                    479: .Fa x
                    480: is in radians.
                    481: .It Fn exp x
                    482: Return the exponential of
                    483: .Fa x .
                    484: .It Fn int x
                    485: Return
                    486: .Fa x
                    487: truncated to an integer value.
                    488: .It Fn log x
                    489: Return the natural logarithm of
                    490: .Fa x .
1.7       aaron     491: .It Fn rand
1.17      jmc       492: Return a random number,
                    493: .Fa n ,
                    494: such that
                    495: .Sm off
                    496: .Pf 0 \*(Le Fa n No \*(Lt 1 .
                    497: .Sm on
1.53      tim       498: Random numbers are non-deterministic unless a seed is explicitly set with
                    499: .Fn srand .
1.17      jmc       500: .It Fn sin x
                    501: Return the sine of
                    502: .Fa x ,
                    503: where
                    504: .Fa x
                    505: is in radians.
                    506: .It Fn sqrt x
                    507: Return the square root of
                    508: .Fa x .
                    509: .It Fn srand expr
1.16      jmc       510: Sets seed for
1.7       aaron     511: .Fn rand
1.17      jmc       512: to
                    513: .Fa expr
1.1       tholo     514: and returns the previous seed.
1.17      jmc       515: If
                    516: .Fa expr
1.53      tim       517: is omitted,
                    518: .Fn rand
                    519: will return non-deterministic random numbers.
1.17      jmc       520: .El
                    521: .Ss String Functions
                    522: .Bl -tag -width "split(s, a, fs)"
1.52      millert   523: .It Fn gensub r s h [t]
                    524: Search the target string
                    525: .Ar t
                    526: for matches of the regular expression
                    527: .Ar r .
                    528: If
                    529: .Ar h
                    530: is a string beginning with
                    531: .Ic g
                    532: or
                    533: .Ic G ,
                    534: then replace all matches of
                    535: .Ar r
                    536: with
                    537: .Ar s .
                    538: Otherwise,
                    539: .Ar h
                    540: is a number indicating which match of
                    541: .Ar r
                    542: to replace.
                    543: If no
                    544: .Ar t
                    545: is supplied,
                    546: .Va $0
                    547: is used instead.
                    548: .\"Within the replacement text
                    549: .\".Ar s ,
                    550: .\"the sequence
                    551: .\".Ar \en ,
                    552: .\"where
                    553: .\".Ar n
                    554: .\"is a digit from 1 to 9, may be used to indicate just the text that
                    555: .\"matched the
                    556: .\".Ar n Ap th
                    557: .\"parenthesized subexpression.
                    558: .\"The sequence
                    559: .\".Ic \e0
                    560: .\"represents the entire text, as does the character
                    561: .\".Ic & .
                    562: Unlike
                    563: .Fn sub
                    564: and
                    565: .Fn gsub ,
                    566: the modified string is returned as the result of the function,
                    567: and the original target is
                    568: .Em not
                    569: changed.
                    570: Note that
                    571: .Ar \en
                    572: sequences within the replacement string
                    573: .Ar s ,
                    574: as supported by GNU
                    575: .Nm ,
                    576: are
                    577: .Em not
                    578: supported at this time.
1.17      jmc       579: .It Fn gsub r t s
                    580: The same as
                    581: .Fn sub
                    582: except that all occurrences of the regular expression are replaced.
                    583: .Fn gsub
                    584: returns the number of replacements.
1.7       aaron     585: .It Fn index s t
1.16      jmc       586: The position in
1.7       aaron     587: .Fa s
1.1       tholo     588: where the string
1.7       aaron     589: .Fa t
1.1       tholo     590: occurs, or 0 if it does not.
1.17      jmc       591: .It Fn length s
                    592: The length of
                    593: .Fa s
                    594: taken as a string,
1.47      millert   595: number of elements in an array for an array argument,
                    596: or length of
1.17      jmc       597: .Va $0
                    598: if no argument is given.
1.7       aaron     599: .It Fn match s r
1.16      jmc       600: The position in
1.7       aaron     601: .Fa s
1.1       tholo     602: where the regular expression
1.7       aaron     603: .Fa r
1.1       tholo     604: occurs, or 0 if it does not.
1.17      jmc       605: The variable
1.7       aaron     606: .Va RSTART
1.17      jmc       607: is set to the starting position of the matched string
                    608: .Pq which is the same as the returned value
                    609: or zero if no match is found.
                    610: The variable
1.7       aaron     611: .Va RLENGTH
1.17      jmc       612: is set to the length of the matched string,
                    613: or \-1 if no match is found.
1.7       aaron     614: .It Fn split s a fs
1.16      jmc       615: Splits the string
1.7       aaron     616: .Fa s
1.1       tholo     617: into array elements
1.7       aaron     618: .Va a[1] , a[2] , ... , a[n]
1.1       tholo     619: and returns
1.7       aaron     620: .Va n .
1.1       tholo     621: The separation is done with the regular expression
1.7       aaron     622: .Ar fs
1.1       tholo     623: or with the field separator
1.7       aaron     624: .Va FS
1.1       tholo     625: if
1.7       aaron     626: .Ar fs
1.1       tholo     627: is not given.
                    628: An empty string as field separator splits the string
                    629: into one array element per character.
1.17      jmc       630: .It Fn sprintf fmt expr ...
                    631: The string resulting from formatting
                    632: .Fa expr , ...
                    633: according to the
1.28      jmc       634: .Xr printf 1
1.17      jmc       635: format
                    636: .Fa fmt .
1.7       aaron     637: .It Fn sub r t s
1.16      jmc       638: Substitutes
1.7       aaron     639: .Fa t
1.1       tholo     640: for the first occurrence of the regular expression
1.7       aaron     641: .Fa r
1.1       tholo     642: in the string
1.7       aaron     643: .Fa s .
1.1       tholo     644: If
1.7       aaron     645: .Fa s
1.1       tholo     646: is not given,
1.7       aaron     647: .Va $0
1.1       tholo     648: is used.
1.17      jmc       649: An ampersand
                    650: .Pq Sq &
                    651: in
                    652: .Fa t
                    653: is replaced in string
                    654: .Fa s
                    655: with regular expression
                    656: .Fa r .
                    657: A literal ampersand can be specified by preceding it with two backslashes
                    658: .Pq Sq \e\e .
                    659: A literal backslash can be specified by preceding it with another backslash
                    660: .Pq Sq \e\e .
1.7       aaron     661: .Fn sub
1.17      jmc       662: returns the number of replacements.
                    663: .It Fn substr s m n
                    664: Return at most the
                    665: .Fa n Ns -character
                    666: substring of
                    667: .Fa s
                    668: that begins at position
                    669: .Fa m
                    670: counted from 1.
                    671: If
                    672: .Fa n
                    673: is omitted, or if
                    674: .Fa n
                    675: specifies more characters than are left in the string,
                    676: the length of the substring is limited by the length of
                    677: .Fa s .
1.7       aaron     678: .It Fn tolower str
1.16      jmc       679: Returns a copy of
1.7       aaron     680: .Fa str
1.1       tholo     681: with all upper-case characters translated to their
                    682: corresponding lower-case equivalents.
1.7       aaron     683: .It Fn toupper str
1.16      jmc       684: Returns a copy of
1.7       aaron     685: .Fa str
1.1       tholo     686: with all lower-case characters translated to their
                    687: corresponding upper-case equivalents.
1.7       aaron     688: .El
1.52      millert   689: .Ss Time Functions
                    690: This version of
                    691: .Nm
                    692: provides the following functions for obtaining and formatting time
                    693: stamps.
                    694: .Bl -tag -width indent
1.57      millert   695: .It Fn mktime datespec
                    696: Converts
                    697: .Fa datespec
                    698: into a timestamp in the same form as a value returned by
                    699: .Fn systime .
                    700: The
                    701: .Fa datespec
                    702: is a string composed of six or seven numbers separated by whitespace:
                    703: .Bd -literal -offset indent
                    704: YYYY MM DD HH MM SS [DST]
                    705: .Ed
                    706: .Pp
                    707: The fields in
                    708: .Fa datespec
                    709: are as follows:
                    710: .Bl -tag -width "YYYY"
1.60      millert   711: .It YYYY
1.57      millert   712: Year: a four-digit year, including the century.
                    713: .It MM
                    714: Month: a number from 1 to 12.
                    715: .It DD
                    716: Day: a number from 1 to 31.
                    717: .It HH
                    718: Hour: a number from 0 to 23.
                    719: .It MM
                    720: Minute: a number from 0 to 59.
                    721: .It SS
                    722: Second: a number from 0 to 60 (permitting a leap second).
                    723: .It DST
                    724: Daylight Saving Time: a positive or zero value indicates that
                    725: DST is or is not in effect.
                    726: If DST is not specified, or is negative,
                    727: .Fn mktime
                    728: will attempt to determine the correct value.
                    729: .El
1.52      millert   730: .It Fn strftime "[format [, timestamp]]"
                    731: Formats
                    732: .Ar timestamp
                    733: according to the string
                    734: .Ar format .
                    735: The format string may contain any of the conversion specifications described
                    736: in the
                    737: .Xr strftime 3
                    738: manual page, as well as any arbitrary text.
                    739: The
                    740: .Ar timestamp
                    741: must be in the same form as a value returned by
1.57      millert   742: .Fn mktime
                    743: and
1.52      millert   744: .Fn systime .
                    745: If
                    746: .Ar timestamp
                    747: is not specified, the current time is used.
                    748: If
                    749: .Ar format
                    750: is not specified, a default format equivalent to the output of
                    751: .Xr date 1
                    752: is used.
                    753: .It Fn systime
                    754: Returns the value of time in seconds since 0 hours, 0 minutes,
                    755: 0 seconds, January 1, 1970, Coordinated Universal Time (UTC).
                    756: .El
1.17      jmc       757: .Ss Input/Output and General Functions
                    758: .Bl -tag -width "getline [var] < file"
                    759: .It Fn close expr
                    760: Closes the file or pipe
                    761: .Fa expr .
                    762: .Fa expr
                    763: should match the string that was used to open the file or pipe.
                    764: .It Ar cmd | Ic getline Op Va var
                    765: Read a record of input from a stream piped from the output of
                    766: .Ar cmd .
                    767: If
                    768: .Va var
                    769: is omitted, the variables
                    770: .Va $0
                    771: and
                    772: .Va NF
                    773: are set.
                    774: Otherwise
                    775: .Va var
                    776: is set.
                    777: If the stream is not open, it is opened.
                    778: As long as the stream remains open, subsequent calls
                    779: will read subsequent records from the stream.
                    780: The stream remains open until explicitly closed with a call to
                    781: .Fn close .
1.24      jmc       782: .Ic getline
                    783: returns 1 for a successful input, 0 for end of file, and \-1 for an error.
                    784: .It Fn fflush [expr]
1.39      jmc       785: Flushes any buffered output for the file or pipe
1.24      jmc       786: .Fa expr ,
                    787: or all open files or pipes if
                    788: .Fa expr
                    789: is omitted.
1.17      jmc       790: .Fa expr
                    791: should match the string that was used to open the file or pipe.
                    792: .It Ic getline
                    793: Sets
                    794: .Va $0
                    795: to the next input record from the current input file.
                    796: This form of
                    797: .Ic getline
                    798: sets the variables
                    799: .Va NF ,
                    800: .Va NR ,
                    801: and
                    802: .Va FNR .
1.7       aaron     803: .Ic getline
1.17      jmc       804: returns 1 for a successful input, 0 for end of file, and \-1 for an error.
                    805: .It Ic getline Va var
                    806: Sets
1.7       aaron     807: .Va $0
1.17      jmc       808: to variable
                    809: .Va var .
                    810: This form of
                    811: .Ic getline
                    812: sets the variables
                    813: .Va NR
                    814: and
                    815: .Va FNR .
                    816: .Ic getline
                    817: returns 1 for a successful input, 0 for end of file, and \-1 for an error.
                    818: .It Xo
                    819: .Ic getline Op Va var
1.47      millert   820: .Pf <\ \& Ar file
1.17      jmc       821: .Xc
                    822: Sets
1.7       aaron     823: .Va $0
1.1       tholo     824: to the next record from
1.7       aaron     825: .Ar file .
1.17      jmc       826: If
                    827: .Va var
                    828: is omitted, the variables
                    829: .Va $0
                    830: and
                    831: .Va NF
                    832: are set.
                    833: Otherwise
                    834: .Va var
                    835: is set.
                    836: If
                    837: .Ar file
                    838: is not open, it is opened.
                    839: As long as the stream remains open, subsequent calls will read subsequent
                    840: records from
                    841: .Ar file .
                    842: .Ar file
                    843: remains open until explicitly closed with a call to
                    844: .Fn close .
                    845: .It Fn system cmd
                    846: Executes
                    847: .Fa cmd
                    848: and returns its exit status.
1.47      millert   849: This will be \-1 upon error,
                    850: .Ar cmd Ns 's
                    851: exit status upon a normal exit,
                    852: 256 +
                    853: .Em sig
                    854: if
                    855: .Fa cmd
                    856: was terminated by a signal, where
                    857: .Em sig
                    858: is the number of the signal,
                    859: or 512 +
                    860: .Em sig
                    861: if there was a core dump.
1.17      jmc       862: .El
1.30      jmc       863: .Ss Bit-Operation Functions
1.29      pyr       864: .Bl -tag -width "lshift(a, b)"
                    865: .It Fn compl x
                    866: Returns the bitwise complement of integer argument x.
                    867: .It Fn and x y
1.30      jmc       868: Performs a bitwise AND on integer arguments x and y.
1.29      pyr       869: .It Fn or x y
1.30      jmc       870: Performs a bitwise OR on integer arguments x and y.
1.29      pyr       871: .It Fn xor x y
1.30      jmc       872: Performs a bitwise Exclusive-OR on integer arguments x and y.
1.29      pyr       873: .It Fn lshift x n
1.39      jmc       874: Returns integer argument x shifted by n bits to the left.
1.29      pyr       875: .It Fn rshift x n
1.39      jmc       876: Returns integer argument x shifted by n bits to the right.
1.29      pyr       877: .El
1.50      millert   878: .Sh ENVIRONMENT
                    879: The following environment variables affect the execution of
                    880: .Nm :
                    881: .Bl -tag -width POSIXLY_CORRECT
                    882: .It Ev POSIXLY_CORRECT
                    883: When set, behave in accordance with the standard, even when it conflicts
                    884: with historical behavior.
                    885: .El
1.37      jmc       886: .Sh EXIT STATUS
                    887: .Ex -std awk
                    888: .Pp
                    889: But note that the
                    890: .Ic exit
                    891: expression can modify the exit status.
1.7       aaron     892: .Sh EXAMPLES
1.16      jmc       893: Print lines longer than 72 characters:
                    894: .Pp
1.7       aaron     895: .Dl length($0) > 72
1.16      jmc       896: .Pp
                    897: Print first two fields in opposite order:
1.7       aaron     898: .Pp
                    899: .Dl { print $2, $1 }
1.16      jmc       900: .Pp
1.47      millert   901: Same, with input fields separated by comma and/or spaces and tabs:
1.7       aaron     902: .Bd -literal -offset indent
1.1       tholo     903: BEGIN { FS = ",[ \et]*|[ \et]+" }
                    904:       { print $2, $1 }
1.7       aaron     905: .Ed
1.16      jmc       906: .Pp
                    907: Add up first column, print sum and average:
1.7       aaron     908: .Bd -literal -offset indent
                    909: { s += $1 }
                    910: END { print "sum is", s, " average is", s/NR }
                    911: .Ed
1.16      jmc       912: .Pp
                    913: Print all lines between start/stop pairs:
1.7       aaron     914: .Pp
                    915: .Dl /start/, /stop/
1.16      jmc       916: .Pp
1.45      naddy     917: Simulate
                    918: .Xr echo 1 :
1.7       aaron     919: .Bd -literal -offset indent
                    920: BEGIN { # Simulate echo(1)
                    921:         for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
                    922:         printf "\en"
                    923:         exit }
1.19      jmc       924: .Ed
                    925: .Pp
                    926: Print an error message to standard error:
                    927: .Bd -literal -offset indent
                    928: { print "error!" > "/dev/stderr" }
1.7       aaron     929: .Ed
1.59      millert   930: .Sh UNUSUAL FLOATING-POINT VALUES
                    931: .Nm
                    932: was designed before IEEE 754 arithmetic defined Not-A-Number (NaN)
                    933: and Infinity values, which are supported by all modern floating-point
                    934: hardware.
                    935: .Pp
                    936: Because
                    937: .Nm
                    938: uses
                    939: .Xr strtod 3
                    940: and
                    941: .Xr atof 3
                    942: to convert string values to double-precision floating-point values,
                    943: modern C libraries also convert strings starting with
                    944: .Dv inf
                    945: and
                    946: .Dv nan
                    947: into infinity and NaN values respectively.
                    948: This led to strange results,
                    949: with something like this:
                    950: .Pp
                    951: .Li echo nancy | awk '{ print $1 + 0 }'
                    952: .Pp
                    953: printing
                    954: .Dv nan
                    955: instead of zero.
                    956: .Pp
                    957: .Nm
                    958: now follows GNU
                    959: .Nm ,
                    960: and prefilters string values before attempting
                    961: to convert them to numbers, as follows:
                    962: .Bl -tag -width Ds
                    963: .It Hexadecimal values
                    964: Hexadecimal values (allowed since C99) convert to zero, as they did
                    965: prior to C99.
                    966: .It NaN values
                    967: The two strings
                    968: .Dq +NAN
                    969: and
                    970: .Dq -NAN
                    971: (case independent) convert to NaN.
                    972: No others do.
                    973: (NaNs can have signs.)
                    974: .It Infinity values
                    975: The two strings
                    976: .Dq +INF
                    977: and
                    978: .Dq -INF
                    979: (case independent) convert to positive and negative infinity, respectively.
                    980: No others do.
                    981: .El
1.7       aaron     982: .Sh SEE ALSO
1.42      tedu      983: .Xr cut 1 ,
1.52      millert   984: .Xr date 1 ,
1.47      millert   985: .Xr grep 1 ,
1.7       aaron     986: .Xr lex 1 ,
1.20      jmc       987: .Xr printf 1 ,
1.16      jmc       988: .Xr sed 1 ,
1.52      millert   989: .Xr strftime 3 ,
1.23      jmc       990: .Xr re_format 7 ,
                    991: .Xr script 7
1.61      jsg       992: .Rs
                    993: .\" 4.4BSD USD:16
1.62      jsg       994: .\".%R Computing Science Technical Report
                    995: .\".%N 68
                    996: .\".%D July 1978
1.61      jsg       997: .%A A. V. Aho
                    998: .%A P. J. Weinberger
                    999: .%A B. W. Kernighan
                   1000: .%T AWK \(em A Pattern Scanning and Processing Language
1.62      jsg      1001: .%J Software \(em Practice and Experience
                   1002: .%V 9:4
                   1003: .%P pp. 267-279
                   1004: .%D April 1979
1.61      jsg      1005: .Re
1.7       aaron    1006: .Rs
                   1007: .%A A. V. Aho
                   1008: .%A B. W. Kernighan
                   1009: .%A P. J. Weinberger
                   1010: .%T The AWK Programming Language
                   1011: .%I Addison-Wesley
1.64      jsg      1012: .%D 2024
                   1013: .%O ISBN 0-13-826972-6
1.7       aaron    1014: .Re
1.26      jmc      1015: .Sh STANDARDS
                   1016: The
                   1017: .Nm
                   1018: utility is compliant with the
1.33      jmc      1019: .St -p1003.1-2008
1.50      millert  1020: specification except that consecutive backslashes in the replacement
                   1021: string argument for
                   1022: .Fn sub
                   1023: and
                   1024: .Fn gsub
1.51      millert  1025: are not collapsed and a slash
                   1026: .Pq Ql /
                   1027: does not need to be escaped in a bracket expression.
1.53      tim      1028: Also, the behaviour of
                   1029: .Fn rand
                   1030: and
                   1031: .Fn srand
                   1032: has been changed to support non-deterministic random numbers.
1.26      jmc      1033: .Pp
                   1034: The flags
                   1035: .Op Fl \&dV
                   1036: and
                   1037: .Op Fl safe ,
1.56      millert  1038: support for regular expressions in
                   1039: .Va RS ,
1.52      millert  1040: as well as the functions
                   1041: .Fn fflush ,
                   1042: .Fn gensub ,
                   1043: .Fn compl ,
                   1044: .Fn and ,
                   1045: .Fn or ,
                   1046: .Fn xor ,
                   1047: .Fn lshift ,
                   1048: .Fn rshift ,
1.57      millert  1049: .Fn mktime ,
1.52      millert  1050: .Fn strftime
                   1051: and
                   1052: .Fn systime
1.26      jmc      1053: are extensions to that specification.
1.8       aaron    1054: .Sh HISTORY
1.13      millert  1055: An
1.8       aaron    1056: .Nm
1.13      millert  1057: utility appeared in
                   1058: .At v7 .
1.7       aaron    1059: .Sh BUGS
1.1       tholo    1060: There are no explicit conversions between numbers and strings.
                   1061: To force an expression to be treated as a number add 0 to it;
                   1062: to force it to be treated as a string concatenate
1.7       aaron    1063: .Li \&""
                   1064: to it.
                   1065: .Pp
1.1       tholo    1066: The scope rules for variables in functions are a botch;
                   1067: the syntax is worse.
1.47      millert  1068: .Pp
1.65    ! millert  1069: Input is expected to be UTF-8 encoded.
        !          1070: Other multibyte character sets are not handled.