src/usr.bin/awk/awk.1 - annotate

Return to awk.1 CVS log
Up to [local] / src / usr.bin / awk
Annotation of src/usr.bin/awk/awk.1, Revision 1.4

1.4     ! millert     1: .\"    $OpenBSD: awk.1,v 1.3 1997/01/20 19:43:18 millert Exp $
1.1       tholo       2: .de EX
                      3: .nf
                      4: .ft CW
                      5: ..
                      6: .de EE
                      7: .br
                      8: .fi
                      9: .ft 1
                     10: ..
                     11: .TH AWK 1
                     12: .CT 1 files prog_other
                     13: .SH NAME
                     14: awk \- pattern-directed scanning and processing language
                     15: .SH SYNOPSIS
1.2       etheisen   16: .B awk|nawk
1.1       tholo      17: [
                     18: .BI \-F
                     19: .I fs
                     20: ]
                     21: [
                     22: .BI \-v
                     23: .I var=value
                     24: ]
                     25: [
                     26: .BI \-mr n
                     27: ]
                     28: [
                     29: .BI \-mf n
                     30: ]
                     31: [
                     32: .I 'prog'
                     33: |
                     34: .BI \-f
                     35: .I progfile
                     36: ]
                     37: [
                     38: .I file ...
                     39: ]
                     40: .SH DESCRIPTION
                     41: .I Awk
                     42: scans each input
                     43: .I file
                     44: for lines that match any of a set of patterns specified literally in
                     45: .IR prog
                     46: or in one or more files
                     47: specified as
                     48: .B \-f
                     49: .IR progfile .
                     50: With each pattern
                     51: there can be an associated action that will be performed
                     52: when a line of a
                     53: .I file
                     54: matches the pattern.
                     55: Each line is matched against the
                     56: pattern portion of every pattern-action statement;
                     57: the associated action is performed for each matched pattern.
                     58: The file name
                     59: .B \-
                     60: means the standard input.
                     61: Any
                     62: .IR file
                     63: of the form
                     64: .I var=value
                     65: is treated as an assignment, not a filename,
                     66: and is executed at the time it would have been opened if it were a filename.
                     67: The option
                     68: .B \-v
                     69: followed by
                     70: .I var=value
                     71: is an assignment to be done before
                     72: .I prog
                     73: is executed;
                     74: any number of
                     75: .B \-v
                     76: options may be present.
                     77: The
                     78: .B \-F
                     79: .IR fs
                     80: option defines the input field separator to be the regular expression
                     81: .IR fs.
                     82: .PP
                     83: An input line is normally made up of fields separated by white space,
                     84: or by regular expression
                     85: .BR FS .
                     86: The fields are denoted
                     87: .BR $1 ,
                     88: .BR $2 ,
                     89: \&..., while
                     90: .B $0
                     91: refers to the entire line.
                     92: If
                     93: .BR FS
                     94: is null, the input line is split into one field per character.
                     95: .PP
                     96: To compensate for inadequate implementation of storage management,
                     97: the
                     98: .B \-mr
                     99: option can be used to set the maximum size of the input record,
                    100: and the
                    101: .B \-mf
                    102: option to set the maximum number of fields.
                    103: .PP
                    104: A pattern-action statement has the form
                    105: .IP
                    106: .IB pattern " { " action " }
                    107: .PP
                    108: A missing
                    109: .BI { " action " }
                    110: means print the line;
                    111: a missing pattern always matches.
                    112: Pattern-action statements are separated by newlines or semicolons.
                    113: .PP
                    114: An action is a sequence of statements.
                    115: A statement can be one of the following:
                    116: .PP
                    117: .EX
                    118: .ta \w'\f(CWdelete array[expression]'u
                    119: .RS
                    120: .nf
                    121: .ft CW
                    122: if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
                    123: while(\fI expression \fP)\fI statement\fP
                    124: for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
                    125: for(\fI var \fPin\fI array \fP)\fI statement\fP
                    126: do\fI statement \fPwhile(\fI expression \fP)
                    127: break
                    128: continue
                    129: {\fR [\fP\fI statement ... \fP\fR] \fP}
                    130: \fIexpression\fP       #\fR commonly\fP\fI var = expression\fP
                    131: print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
                    132: printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
                    133: return\fR [ \fP\fIexpression \fP\fR]\fP
                    134: next   #\fR skip remaining patterns on this input line\fP
                    135: nextfile       #\fR skip rest of this file, open next, start at top\fP
                    136: delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
                    137: delete\fI array\fP     #\fR delete all elements of array\fP
                    138: exit\fR [ \fP\fIexpression \fP\fR]\fP  #\fR exit immediately; status is \fP\fIexpression\fP
                    139: .fi
                    140: .RE
                    141: .EE
                    142: .DT
                    143: .PP
                    144: Statements are terminated by
                    145: semicolons, newlines or right braces.
                    146: An empty
                    147: .I expression-list
                    148: stands for
                    149: .BR $0 .
                    150: String constants are quoted \&\f(CW"\ "\fR,
                    151: with the usual C escapes recognized within.
                    152: Expressions take on string or numeric values as appropriate,
                    153: and are built using the operators
                    154: .B + \- * / % ^
                    155: (exponentiation), and concatenation (indicated by white space).
                    156: The operators
                    157: .B
                    158: ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
                    159: are also available in expressions.
                    160: Variables may be scalars, array elements
                    161: (denoted
                    162: .IB x  [ i ] )
                    163: or fields.
                    164: Variables are initialized to the null string.
                    165: Array subscripts may be any string,
                    166: not necessarily numeric;
                    167: this allows for a form of associative memory.
                    168: Multiple subscripts such as
                    169: .B [i,j,k]
                    170: are permitted; the constituents are concatenated,
                    171: separated by the value of
                    172: .BR SUBSEP .
                    173: .PP
                    174: The
                    175: .B print
                    176: statement prints its arguments on the standard output
                    177: (or on a file if
                    178: .BI > file
                    179: or
                    180: .BI >> file
                    181: is present or on a pipe if
                    182: .BI | cmd
                    183: is present), separated by the current output field separator,
                    184: and terminated by the output record separator.
                    185: .I file
                    186: and
                    187: .I cmd
                    188: may be literal names or parenthesized expressions;
                    189: identical string values in different statements denote
                    190: the same open file.
                    191: The
                    192: .B printf
                    193: statement formats its expression list according to the format
                    194: (see
                    195: .IR printf (3)) .
                    196: The built-in function
                    197: .BI close( expr )
                    198: closes the file or pipe
                    199: .IR expr .
                    200: The built-in function
                    201: .BI fflush( expr )
                    202: flushes any buffered output for the file or pipe
                    203: .IR expr .
                    204: .PP
                    205: The mathematical functions
                    206: .BR exp ,
                    207: .BR log ,
                    208: .BR sqrt ,
                    209: .BR sin ,
                    210: .BR cos ,
                    211: and
                    212: .BR atan2
                    213: are built in.
                    214: Other built-in functions:
                    215: .TF length
                    216: .TP
                    217: .B length
                    218: the length of its argument
                    219: taken as a string,
                    220: or of
                    221: .B $0
                    222: if no argument.
                    223: .TP
                    224: .B rand
                    225: random number on (0,1)
                    226: .TP
                    227: .B srand
                    228: sets seed for
                    229: .B rand
                    230: and returns the previous seed.
                    231: .TP
                    232: .B int
                    233: truncates to an integer value
                    234: .TP
                    235: .BI substr( s , " m" , " n\fB)
                    236: the
                    237: .IR n -character
                    238: substring of
                    239: .I s
                    240: that begins at position
                    241: .IR m
                    242: counted from 1.
                    243: .TP
                    244: .BI index( s , " t" )
                    245: the position in
                    246: .I s
                    247: where the string
                    248: .I t
                    249: occurs, or 0 if it does not.
                    250: .TP
                    251: .BI match( s , " r" )
                    252: the position in
                    253: .I s
                    254: where the regular expression
                    255: .I r
                    256: occurs, or 0 if it does not.
                    257: The variables
                    258: .B RSTART
                    259: and
                    260: .B RLENGTH
                    261: are set to the position and length of the matched string.
                    262: .TP
                    263: .BI split( s , " a" , " fs\fB)
                    264: splits the string
                    265: .I s
                    266: into array elements
                    267: .IB a [1] ,
                    268: .IB a [2] ,
                    269: \&...,
                    270: .IB a [ n ] ,
                    271: and returns
                    272: .IR n .
                    273: The separation is done with the regular expression
                    274: .I fs
                    275: or with the field separator
                    276: .B FS
                    277: if
                    278: .I fs
                    279: is not given.
                    280: An empty string as field separator splits the string
                    281: into one array element per character.
                    282: .TP
                    283: .BI sub( r , " t" , " s\fB)
                    284: substitutes
                    285: .I t
                    286: for the first occurrence of the regular expression
                    287: .I r
                    288: in the string
                    289: .IR s .
                    290: If
                    291: .I s
                    292: is not given,
                    293: .B $0
                    294: is used.
                    295: .TP
                    296: .B gsub
                    297: same as
                    298: .B sub
                    299: except that all occurrences of the regular expression
                    300: are replaced;
                    301: .B sub
                    302: and
                    303: .B gsub
                    304: return the number of replacements.
                    305: .TP
                    306: .BI sprintf( fmt , " expr" , " ...\fB )
                    307: the string resulting from formatting
                    308: .I expr ...
                    309: according to the
                    310: .IR printf (3)
                    311: format
                    312: .I fmt
                    313: .TP
                    314: .BI system( cmd )
                    315: executes
                    316: .I cmd
                    317: and returns its exit status
                    318: .TP
                    319: .BI tolower( str )
                    320: returns a copy of
                    321: .I str
                    322: with all upper-case characters translated to their
                    323: corresponding lower-case equivalents.
                    324: .TP
                    325: .BI toupper( str )
                    326: returns a copy of
                    327: .I str
                    328: with all lower-case characters translated to their
                    329: corresponding upper-case equivalents.
                    330: .PD
                    331: .PP
                    332: The ``function''
                    333: .B getline
                    334: sets
                    335: .B $0
                    336: to the next input record from the current input file;
                    337: .B getline
                    338: .BI < file
                    339: sets
                    340: .B $0
                    341: to the next record from
                    342: .IR file .
                    343: .B getline
                    344: .I x
                    345: sets variable
                    346: .I x
                    347: instead.
                    348: Finally,
                    349: .IB cmd " | getline
                    350: pipes the output of
                    351: .I cmd
                    352: into
                    353: .BR getline ;
                    354: each call of
                    355: .B getline
                    356: returns the next line of output from
                    357: .IR cmd .
                    358: In all cases,
                    359: .B getline
                    360: returns 1 for a successful input,
                    361: 0 for end of file, and \-1 for an error.
                    362: .PP
                    363: Patterns are arbitrary Boolean combinations
                    364: (with
                    365: .BR "! || &&" )
                    366: of regular expressions and
                    367: relational expressions.
                    368: Regular expressions are as in
                    369: .IR egrep ;
                    370: see
                    371: .IR grep (1).
                    372: Isolated regular expressions
                    373: in a pattern apply to the entire line.
                    374: Regular expressions may also occur in
                    375: relational expressions, using the operators
                    376: .BR ~
                    377: and
                    378: .BR !~ .
                    379: .BI / re /
                    380: is a constant regular expression;
                    381: any string (constant or variable) may be used
                    382: as a regular expression, except in the position of an isolated regular expression
                    383: in a pattern.
                    384: .PP
                    385: A pattern may consist of two patterns separated by a comma;
                    386: in this case, the action is performed for all lines
                    387: from an occurrence of the first pattern
                    388: though an occurrence of the second.
                    389: .PP
                    390: A relational expression is one of the following:
                    391: .IP
                    392: .I expression matchop regular-expression
                    393: .br
                    394: .I expression relop expression
                    395: .br
                    396: .IB expression " in " array-name
                    397: .br
                    398: .BI ( expr , expr,... ") in " array-name
                    399: .PP
                    400: where a relop is any of the six relational operators in C,
                    401: and a matchop is either
                    402: .B ~
                    403: (matches)
                    404: or
                    405: .B !~
                    406: (does not match).
                    407: A conditional is an arithmetic expression,
                    408: a relational expression,
                    409: or a Boolean combination
                    410: of these.
                    411: .PP
                    412: The special patterns
                    413: .B BEGIN
                    414: and
                    415: .B END
                    416: may be used to capture control before the first input line is read
                    417: and after the last.
                    418: .B BEGIN
                    419: and
                    420: .B END
                    421: do not combine with other patterns.
                    422: .PP
                    423: Variable names with special meanings:
                    424: .TF FILENAME
                    425: .TP
                    426: .B CONVFMT
                    427: conversion format used when converting numbers
1.3       millert   428: (default
1.1       tholo     429: .BR "%.6g" )
                    430: .TP
                    431: .B FS
                    432: regular expression used to separate fields; also settable
                    433: by option
                    434: .BI \-F fs.
                    435: .TP
                    436: .BR NF
                    437: number of fields in the current record
                    438: .TP
                    439: .B NR
                    440: ordinal number of the current record
                    441: .TP
                    442: .B FNR
                    443: ordinal number of the current record in the current file
                    444: .TP
                    445: .B FILENAME
                    446: the name of the current input file
                    447: .TP
                    448: .B RS
                    449: input record separator (default newline)
                    450: .TP
                    451: .B OFS
                    452: output field separator (default blank)
                    453: .TP
                    454: .B ORS
                    455: output record separator (default newline)
                    456: .TP
                    457: .B OFMT
                    458: output format for numbers (default
                    459: .BR "%.6g" )
                    460: .TP
                    461: .B SUBSEP
                    462: separates multiple subscripts (default 034)
                    463: .TP
                    464: .B ARGC
                    465: argument count, assignable
                    466: .TP
                    467: .B ARGV
                    468: argument array, assignable;
                    469: non-null members are taken as filenames
                    470: .TP
                    471: .B ENVIRON
                    472: array of environment variables; subscripts are names.
                    473: .PD
                    474: .PP
                    475: Functions may be defined (at the position of a pattern-action statement) thus:
                    476: .IP
                    477: .B
                    478: function foo(a, b, c) { ...; return x }
                    479: .PP
                    480: Parameters are passed by value if scalar and by reference if array name;
                    481: functions may be called recursively.
                    482: Parameters are local to the function; all other variables are global.
                    483: Thus local variables may be created by providing excess parameters in
                    484: the function definition.
                    485: .SH EXAMPLES
                    486: .TP
1.3       millert   487: .EX
1.1       tholo     488: length($0) > 72
1.3       millert   489: .EE
1.1       tholo     490: Print lines longer than 72 characters.
                    491: .TP
1.3       millert   492: .EX
1.1       tholo     493: { print $2, $1 }
1.3       millert   494: .EE
1.1       tholo     495: Print first two fields in opposite order.
                    496: .PP
                    497: .EX
                    498: BEGIN { FS = ",[ \et]*|[ \et]+" }
                    499:       { print $2, $1 }
                    500: .EE
                    501: .ns
                    502: .IP
                    503: Same, with input fields separated by comma and/or blanks and tabs.
                    504: .PP
                    505: .EX
                    506: .nf
                    507:        { s += $1 }
                    508: END    { print "sum is", s, " average is", s/NR }
                    509: .fi
                    510: .EE
                    511: .ns
                    512: .IP
                    513: Add up first column, print sum and average.
                    514: .TP
1.3       millert   515: .EX
1.1       tholo     516: /start/, /stop/
1.3       millert   517: .EE
1.1       tholo     518: Print all lines between start/stop pairs.
                    519: .PP
                    520: .EX
                    521: .nf
                    522: BEGIN  {       # Simulate echo(1)
                    523:        for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
                    524:        printf "\en"
                    525:        exit }
                    526: .fi
                    527: .EE
                    528: .SH SEE ALSO
                    529: .IR lex (1),
                    530: .IR sed (1)
                    531: .br
                    532: A. V. Aho, B. W. Kernighan, P. J. Weinberger,
                    533: .I
                    534: The AWK Programming Language,
                    535: Addison-Wesley, 1988.  ISBN 0-201-07981-X
                    536: .SH BUGS
                    537: There are no explicit conversions between numbers and strings.
                    538: To force an expression to be treated as a number add 0 to it;
                    539: to force it to be treated as a string concatenate
                    540: \&\f(CW""\fP to it.
                    541: .br
                    542: The scope rules for variables in functions are a botch;
                    543: the syntax is worse.