[BACK]Return to awk.1 CVS log [TXT][DIR] Up to [local] / src / usr.bin / awk

Annotation of src/usr.bin/awk/awk.1, Revision 1.2

1.2     ! etheisen    1: .\"    $OpenBSD$
1.1       tholo       2: .de EX
                      3: .nf
                      4: .ft CW
                      5: ..
                      6: .de EE
                      7: .br
                      8: .fi
                      9: .ft 1
                     10: ..
                     11: awk
                     12: .TH AWK 1
                     13: .CT 1 files prog_other
                     14: .SH NAME
                     15: awk \- pattern-directed scanning and processing language
                     16: .SH SYNOPSIS
1.2     ! etheisen   17: .B awk|nawk
1.1       tholo      18: [
                     19: .BI \-F
                     20: .I fs
                     21: ]
                     22: [
                     23: .BI \-v
                     24: .I var=value
                     25: ]
                     26: [
                     27: .BI \-mr n
                     28: ]
                     29: [
                     30: .BI \-mf n
                     31: ]
                     32: [
                     33: .I 'prog'
                     34: |
                     35: .BI \-f
                     36: .I progfile
                     37: ]
                     38: [
                     39: .I file ...
                     40: ]
                     41: .SH DESCRIPTION
                     42: .I Awk
                     43: scans each input
                     44: .I file
                     45: for lines that match any of a set of patterns specified literally in
                     46: .IR prog
                     47: or in one or more files
                     48: specified as
                     49: .B \-f
                     50: .IR progfile .
                     51: With each pattern
                     52: there can be an associated action that will be performed
                     53: when a line of a
                     54: .I file
                     55: matches the pattern.
                     56: Each line is matched against the
                     57: pattern portion of every pattern-action statement;
                     58: the associated action is performed for each matched pattern.
                     59: The file name
                     60: .B \-
                     61: means the standard input.
                     62: Any
                     63: .IR file
                     64: of the form
                     65: .I var=value
                     66: is treated as an assignment, not a filename,
                     67: and is executed at the time it would have been opened if it were a filename.
                     68: The option
                     69: .B \-v
                     70: followed by
                     71: .I var=value
                     72: is an assignment to be done before
                     73: .I prog
                     74: is executed;
                     75: any number of
                     76: .B \-v
                     77: options may be present.
                     78: The
                     79: .B \-F
                     80: .IR fs
                     81: option defines the input field separator to be the regular expression
                     82: .IR fs.
                     83: .PP
                     84: An input line is normally made up of fields separated by white space,
                     85: or by regular expression
                     86: .BR FS .
                     87: The fields are denoted
                     88: .BR $1 ,
                     89: .BR $2 ,
                     90: \&..., while
                     91: .B $0
                     92: refers to the entire line.
                     93: If
                     94: .BR FS
                     95: is null, the input line is split into one field per character.
                     96: .PP
                     97: To compensate for inadequate implementation of storage management,
                     98: the
                     99: .B \-mr
                    100: option can be used to set the maximum size of the input record,
                    101: and the
                    102: .B \-mf
                    103: option to set the maximum number of fields.
                    104: .PP
                    105: A pattern-action statement has the form
                    106: .IP
                    107: .IB pattern " { " action " }
                    108: .PP
                    109: A missing
                    110: .BI { " action " }
                    111: means print the line;
                    112: a missing pattern always matches.
                    113: Pattern-action statements are separated by newlines or semicolons.
                    114: .PP
                    115: An action is a sequence of statements.
                    116: A statement can be one of the following:
                    117: .PP
                    118: .EX
                    119: .ta \w'\f(CWdelete array[expression]'u
                    120: .RS
                    121: .nf
                    122: .ft CW
                    123: if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
                    124: while(\fI expression \fP)\fI statement\fP
                    125: for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
                    126: for(\fI var \fPin\fI array \fP)\fI statement\fP
                    127: do\fI statement \fPwhile(\fI expression \fP)
                    128: break
                    129: continue
                    130: {\fR [\fP\fI statement ... \fP\fR] \fP}
                    131: \fIexpression\fP       #\fR commonly\fP\fI var = expression\fP
                    132: print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
                    133: printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
                    134: return\fR [ \fP\fIexpression \fP\fR]\fP
                    135: next   #\fR skip remaining patterns on this input line\fP
                    136: nextfile       #\fR skip rest of this file, open next, start at top\fP
                    137: delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
                    138: delete\fI array\fP     #\fR delete all elements of array\fP
                    139: exit\fR [ \fP\fIexpression \fP\fR]\fP  #\fR exit immediately; status is \fP\fIexpression\fP
                    140: .fi
                    141: .RE
                    142: .EE
                    143: .DT
                    144: .PP
                    145: Statements are terminated by
                    146: semicolons, newlines or right braces.
                    147: An empty
                    148: .I expression-list
                    149: stands for
                    150: .BR $0 .
                    151: String constants are quoted \&\f(CW"\ "\fR,
                    152: with the usual C escapes recognized within.
                    153: Expressions take on string or numeric values as appropriate,
                    154: and are built using the operators
                    155: .B + \- * / % ^
                    156: (exponentiation), and concatenation (indicated by white space).
                    157: The operators
                    158: .B
                    159: ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
                    160: are also available in expressions.
                    161: Variables may be scalars, array elements
                    162: (denoted
                    163: .IB x  [ i ] )
                    164: or fields.
                    165: Variables are initialized to the null string.
                    166: Array subscripts may be any string,
                    167: not necessarily numeric;
                    168: this allows for a form of associative memory.
                    169: Multiple subscripts such as
                    170: .B [i,j,k]
                    171: are permitted; the constituents are concatenated,
                    172: separated by the value of
                    173: .BR SUBSEP .
                    174: .PP
                    175: The
                    176: .B print
                    177: statement prints its arguments on the standard output
                    178: (or on a file if
                    179: .BI > file
                    180: or
                    181: .BI >> file
                    182: is present or on a pipe if
                    183: .BI | cmd
                    184: is present), separated by the current output field separator,
                    185: and terminated by the output record separator.
                    186: .I file
                    187: and
                    188: .I cmd
                    189: may be literal names or parenthesized expressions;
                    190: identical string values in different statements denote
                    191: the same open file.
                    192: The
                    193: .B printf
                    194: statement formats its expression list according to the format
                    195: (see
                    196: .IR printf (3)) .
                    197: The built-in function
                    198: .BI close( expr )
                    199: closes the file or pipe
                    200: .IR expr .
                    201: The built-in function
                    202: .BI fflush( expr )
                    203: flushes any buffered output for the file or pipe
                    204: .IR expr .
                    205: .PP
                    206: The mathematical functions
                    207: .BR exp ,
                    208: .BR log ,
                    209: .BR sqrt ,
                    210: .BR sin ,
                    211: .BR cos ,
                    212: and
                    213: .BR atan2
                    214: are built in.
                    215: Other built-in functions:
                    216: .TF length
                    217: .TP
                    218: .B length
                    219: the length of its argument
                    220: taken as a string,
                    221: or of
                    222: .B $0
                    223: if no argument.
                    224: .TP
                    225: .B rand
                    226: random number on (0,1)
                    227: .TP
                    228: .B srand
                    229: sets seed for
                    230: .B rand
                    231: and returns the previous seed.
                    232: .TP
                    233: .B int
                    234: truncates to an integer value
                    235: .TP
                    236: .BI substr( s , " m" , " n\fB)
                    237: the
                    238: .IR n -character
                    239: substring of
                    240: .I s
                    241: that begins at position
                    242: .IR m
                    243: counted from 1.
                    244: .TP
                    245: .BI index( s , " t" )
                    246: the position in
                    247: .I s
                    248: where the string
                    249: .I t
                    250: occurs, or 0 if it does not.
                    251: .TP
                    252: .BI match( s , " r" )
                    253: the position in
                    254: .I s
                    255: where the regular expression
                    256: .I r
                    257: occurs, or 0 if it does not.
                    258: The variables
                    259: .B RSTART
                    260: and
                    261: .B RLENGTH
                    262: are set to the position and length of the matched string.
                    263: .TP
                    264: .BI split( s , " a" , " fs\fB)
                    265: splits the string
                    266: .I s
                    267: into array elements
                    268: .IB a [1] ,
                    269: .IB a [2] ,
                    270: \&...,
                    271: .IB a [ n ] ,
                    272: and returns
                    273: .IR n .
                    274: The separation is done with the regular expression
                    275: .I fs
                    276: or with the field separator
                    277: .B FS
                    278: if
                    279: .I fs
                    280: is not given.
                    281: An empty string as field separator splits the string
                    282: into one array element per character.
                    283: .TP
                    284: .BI sub( r , " t" , " s\fB)
                    285: substitutes
                    286: .I t
                    287: for the first occurrence of the regular expression
                    288: .I r
                    289: in the string
                    290: .IR s .
                    291: If
                    292: .I s
                    293: is not given,
                    294: .B $0
                    295: is used.
                    296: .TP
                    297: .B gsub
                    298: same as
                    299: .B sub
                    300: except that all occurrences of the regular expression
                    301: are replaced;
                    302: .B sub
                    303: and
                    304: .B gsub
                    305: return the number of replacements.
                    306: .TP
                    307: .BI sprintf( fmt , " expr" , " ...\fB )
                    308: the string resulting from formatting
                    309: .I expr ...
                    310: according to the
                    311: .IR printf (3)
                    312: format
                    313: .I fmt
                    314: .TP
                    315: .BI system( cmd )
                    316: executes
                    317: .I cmd
                    318: and returns its exit status
                    319: .TP
                    320: .BI tolower( str )
                    321: returns a copy of
                    322: .I str
                    323: with all upper-case characters translated to their
                    324: corresponding lower-case equivalents.
                    325: .TP
                    326: .BI toupper( str )
                    327: returns a copy of
                    328: .I str
                    329: with all lower-case characters translated to their
                    330: corresponding upper-case equivalents.
                    331: .PD
                    332: .PP
                    333: The ``function''
                    334: .B getline
                    335: sets
                    336: .B $0
                    337: to the next input record from the current input file;
                    338: .B getline
                    339: .BI < file
                    340: sets
                    341: .B $0
                    342: to the next record from
                    343: .IR file .
                    344: .B getline
                    345: .I x
                    346: sets variable
                    347: .I x
                    348: instead.
                    349: Finally,
                    350: .IB cmd " | getline
                    351: pipes the output of
                    352: .I cmd
                    353: into
                    354: .BR getline ;
                    355: each call of
                    356: .B getline
                    357: returns the next line of output from
                    358: .IR cmd .
                    359: In all cases,
                    360: .B getline
                    361: returns 1 for a successful input,
                    362: 0 for end of file, and \-1 for an error.
                    363: .PP
                    364: Patterns are arbitrary Boolean combinations
                    365: (with
                    366: .BR "! || &&" )
                    367: of regular expressions and
                    368: relational expressions.
                    369: Regular expressions are as in
                    370: .IR egrep ;
                    371: see
                    372: .IR grep (1).
                    373: Isolated regular expressions
                    374: in a pattern apply to the entire line.
                    375: Regular expressions may also occur in
                    376: relational expressions, using the operators
                    377: .BR ~
                    378: and
                    379: .BR !~ .
                    380: .BI / re /
                    381: is a constant regular expression;
                    382: any string (constant or variable) may be used
                    383: as a regular expression, except in the position of an isolated regular expression
                    384: in a pattern.
                    385: .PP
                    386: A pattern may consist of two patterns separated by a comma;
                    387: in this case, the action is performed for all lines
                    388: from an occurrence of the first pattern
                    389: though an occurrence of the second.
                    390: .PP
                    391: A relational expression is one of the following:
                    392: .IP
                    393: .I expression matchop regular-expression
                    394: .br
                    395: .I expression relop expression
                    396: .br
                    397: .IB expression " in " array-name
                    398: .br
                    399: .BI ( expr , expr,... ") in " array-name
                    400: .PP
                    401: where a relop is any of the six relational operators in C,
                    402: and a matchop is either
                    403: .B ~
                    404: (matches)
                    405: or
                    406: .B !~
                    407: (does not match).
                    408: A conditional is an arithmetic expression,
                    409: a relational expression,
                    410: or a Boolean combination
                    411: of these.
                    412: .PP
                    413: The special patterns
                    414: .B BEGIN
                    415: and
                    416: .B END
                    417: may be used to capture control before the first input line is read
                    418: and after the last.
                    419: .B BEGIN
                    420: and
                    421: .B END
                    422: do not combine with other patterns.
                    423: .PP
                    424: Variable names with special meanings:
                    425: .TF FILENAME
                    426: .TP
                    427: .B CONVFMT
                    428: conversion format used when converting numbers
                    429: .BR "%.6g" )
                    430: .TP
                    431: .B FS
                    432: regular expression used to separate fields; also settable
                    433: by option
                    434: .BI \-F fs.
                    435: .TP
                    436: .BR NF
                    437: number of fields in the current record
                    438: .TP
                    439: .B NR
                    440: ordinal number of the current record
                    441: .TP
                    442: .B FNR
                    443: ordinal number of the current record in the current file
                    444: .TP
                    445: .B FILENAME
                    446: the name of the current input file
                    447: .TP
                    448: .B RS
                    449: input record separator (default newline)
                    450: .TP
                    451: .B OFS
                    452: output field separator (default blank)
                    453: .TP
                    454: .B ORS
                    455: output record separator (default newline)
                    456: .TP
                    457: .B OFMT
                    458: output format for numbers (default
                    459: .BR "%.6g" )
                    460: .TP
                    461: .B SUBSEP
                    462: separates multiple subscripts (default 034)
                    463: .TP
                    464: .B ARGC
                    465: argument count, assignable
                    466: .TP
                    467: .B ARGV
                    468: argument array, assignable;
                    469: non-null members are taken as filenames
                    470: .TP
                    471: .B ENVIRON
                    472: array of environment variables; subscripts are names.
                    473: .PD
                    474: .PP
                    475: Functions may be defined (at the position of a pattern-action statement) thus:
                    476: .IP
                    477: .B
                    478: function foo(a, b, c) { ...; return x }
                    479: .PP
                    480: Parameters are passed by value if scalar and by reference if array name;
                    481: functions may be called recursively.
                    482: Parameters are local to the function; all other variables are global.
                    483: Thus local variables may be created by providing excess parameters in
                    484: the function definition.
                    485: .SH EXAMPLES
                    486: .TP
                    487: .B
                    488: length($0) > 72
                    489: Print lines longer than 72 characters.
                    490: .TP
                    491: .B
                    492: { print $2, $1 }
                    493: Print first two fields in opposite order.
                    494: .PP
                    495: .EX
                    496: BEGIN { FS = ",[ \et]*|[ \et]+" }
                    497:       { print $2, $1 }
                    498: .EE
                    499: .ns
                    500: .IP
                    501: Same, with input fields separated by comma and/or blanks and tabs.
                    502: .PP
                    503: .EX
                    504: .nf
                    505:        { s += $1 }
                    506: END    { print "sum is", s, " average is", s/NR }
                    507: .fi
                    508: .EE
                    509: .ns
                    510: .IP
                    511: Add up first column, print sum and average.
                    512: .TP
                    513: .B
                    514: /start/, /stop/
                    515: Print all lines between start/stop pairs.
                    516: .PP
                    517: .EX
                    518: .nf
                    519: BEGIN  {       # Simulate echo(1)
                    520:        for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
                    521:        printf "\en"
                    522:        exit }
                    523: .fi
                    524: .EE
                    525: .SH SEE ALSO
                    526: .IR lex (1),
                    527: .IR sed (1)
                    528: .br
                    529: A. V. Aho, B. W. Kernighan, P. J. Weinberger,
                    530: .I
                    531: The AWK Programming Language,
                    532: Addison-Wesley, 1988.  ISBN 0-201-07981-X
                    533: .SH BUGS
                    534: There are no explicit conversions between numbers and strings.
                    535: To force an expression to be treated as a number add 0 to it;
                    536: to force it to be treated as a string concatenate
                    537: \&\f(CW""\fP to it.
                    538: .br
                    539: The scope rules for variables in functions are a botch;
                    540: the syntax is worse.