[BACK]Return to awk.1 CVS log [TXT][DIR] Up to [local] / src / usr.bin / awk

Annotation of src/usr.bin/awk/awk.1, Revision 1.6

1.6     ! aaron       1: .\"    $OpenBSD: awk.1,v 1.5 1998/03/03 01:56:00 angelos Exp $
1.1       tholo       2: .de EX
                      3: .nf
                      4: .ft CW
                      5: ..
                      6: .de EE
                      7: .br
                      8: .fi
                      9: .ft 1
                     10: ..
                     11: .TH AWK 1
                     12: .CT 1 files prog_other
                     13: .SH NAME
                     14: awk \- pattern-directed scanning and processing language
                     15: .SH SYNOPSIS
1.2       etheisen   16: .B awk|nawk
1.1       tholo      17: [
                     18: .BI \-F
                     19: .I fs
                     20: ]
                     21: [
                     22: .BI \-v
                     23: .I var=value
                     24: ]
                     25: [
1.5       angelos    26: .BI \-safe
                     27: ]
                     28: [
1.1       tholo      29: .BI \-mr n
                     30: ]
                     31: [
                     32: .BI \-mf n
                     33: ]
                     34: [
                     35: .I 'prog'
                     36: |
                     37: .BI \-f
                     38: .I progfile
                     39: ]
                     40: [
                     41: .I file ...
                     42: ]
                     43: .SH DESCRIPTION
                     44: .I Awk
                     45: scans each input
                     46: .I file
                     47: for lines that match any of a set of patterns specified literally in
                     48: .IR prog
                     49: or in one or more files
                     50: specified as
                     51: .B \-f
                     52: .IR progfile .
                     53: With each pattern
                     54: there can be an associated action that will be performed
                     55: when a line of a
                     56: .I file
                     57: matches the pattern.
                     58: Each line is matched against the
                     59: pattern portion of every pattern-action statement;
                     60: the associated action is performed for each matched pattern.
1.6     ! aaron      61: The file name
1.1       tholo      62: .B \-
                     63: means the standard input.
                     64: Any
                     65: .IR file
                     66: of the form
                     67: .I var=value
                     68: is treated as an assignment, not a filename,
                     69: and is executed at the time it would have been opened if it were a filename.
                     70: The option
                     71: .B \-v
                     72: followed by
                     73: .I var=value
                     74: is an assignment to be done before
                     75: .I prog
                     76: is executed;
                     77: any number of
                     78: .B \-v
                     79: options may be present.
                     80: The
                     81: .B \-F
                     82: .IR fs
                     83: option defines the input field separator to be the regular expression
                     84: .IR fs.
1.5       angelos    85: The
                     86: .B \-safe
                     87: option disables file output (print >, print >>), process creation
                     88: (cmd|getline, print |, system), and access to the environment (ENVIRON). This
                     89: is a first (and not very reliable) approximation to a "safe" version of awk.
1.1       tholo      90: .PP
                     91: An input line is normally made up of fields separated by white space,
                     92: or by regular expression
                     93: .BR FS .
                     94: The fields are denoted
                     95: .BR $1 ,
                     96: .BR $2 ,
                     97: \&..., while
                     98: .B $0
                     99: refers to the entire line.
                    100: If
                    101: .BR FS
                    102: is null, the input line is split into one field per character.
                    103: .PP
                    104: To compensate for inadequate implementation of storage management,
1.6     ! aaron     105: the
1.1       tholo     106: .B \-mr
                    107: option can be used to set the maximum size of the input record,
                    108: and the
                    109: .B \-mf
                    110: option to set the maximum number of fields.
                    111: .PP
                    112: A pattern-action statement has the form
                    113: .IP
                    114: .IB pattern " { " action " }
                    115: .PP
1.6     ! aaron     116: A missing
1.1       tholo     117: .BI { " action " }
                    118: means print the line;
                    119: a missing pattern always matches.
                    120: Pattern-action statements are separated by newlines or semicolons.
                    121: .PP
                    122: An action is a sequence of statements.
                    123: A statement can be one of the following:
                    124: .PP
                    125: .EX
                    126: .ta \w'\f(CWdelete array[expression]'u
                    127: .RS
                    128: .nf
                    129: .ft CW
                    130: if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
                    131: while(\fI expression \fP)\fI statement\fP
                    132: for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
                    133: for(\fI var \fPin\fI array \fP)\fI statement\fP
                    134: do\fI statement \fPwhile(\fI expression \fP)
                    135: break
                    136: continue
                    137: {\fR [\fP\fI statement ... \fP\fR] \fP}
                    138: \fIexpression\fP       #\fR commonly\fP\fI var = expression\fP
                    139: print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
                    140: printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
                    141: return\fR [ \fP\fIexpression \fP\fR]\fP
                    142: next   #\fR skip remaining patterns on this input line\fP
                    143: nextfile       #\fR skip rest of this file, open next, start at top\fP
                    144: delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
                    145: delete\fI array\fP     #\fR delete all elements of array\fP
                    146: exit\fR [ \fP\fIexpression \fP\fR]\fP  #\fR exit immediately; status is \fP\fIexpression\fP
                    147: .fi
                    148: .RE
                    149: .EE
                    150: .DT
                    151: .PP
                    152: Statements are terminated by
                    153: semicolons, newlines or right braces.
                    154: An empty
                    155: .I expression-list
                    156: stands for
                    157: .BR $0 .
                    158: String constants are quoted \&\f(CW"\ "\fR,
                    159: with the usual C escapes recognized within.
                    160: Expressions take on string or numeric values as appropriate,
                    161: and are built using the operators
                    162: .B + \- * / % ^
                    163: (exponentiation), and concatenation (indicated by white space).
                    164: The operators
                    165: .B
                    166: ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
                    167: are also available in expressions.
                    168: Variables may be scalars, array elements
                    169: (denoted
                    170: .IB x  [ i ] )
                    171: or fields.
                    172: Variables are initialized to the null string.
                    173: Array subscripts may be any string,
                    174: not necessarily numeric;
                    175: this allows for a form of associative memory.
                    176: Multiple subscripts such as
                    177: .B [i,j,k]
                    178: are permitted; the constituents are concatenated,
                    179: separated by the value of
                    180: .BR SUBSEP .
                    181: .PP
                    182: The
                    183: .B print
                    184: statement prints its arguments on the standard output
                    185: (or on a file if
                    186: .BI > file
                    187: or
                    188: .BI >> file
                    189: is present or on a pipe if
                    190: .BI | cmd
                    191: is present), separated by the current output field separator,
                    192: and terminated by the output record separator.
                    193: .I file
                    194: and
                    195: .I cmd
                    196: may be literal names or parenthesized expressions;
                    197: identical string values in different statements denote
                    198: the same open file.
                    199: The
                    200: .B printf
                    201: statement formats its expression list according to the format
                    202: (see
                    203: .IR printf (3)) .
                    204: The built-in function
                    205: .BI close( expr )
                    206: closes the file or pipe
                    207: .IR expr .
                    208: The built-in function
                    209: .BI fflush( expr )
                    210: flushes any buffered output for the file or pipe
                    211: .IR expr .
                    212: .PP
                    213: The mathematical functions
                    214: .BR exp ,
                    215: .BR log ,
                    216: .BR sqrt ,
                    217: .BR sin ,
                    218: .BR cos ,
                    219: and
1.6     ! aaron     220: .BR atan2
1.1       tholo     221: are built in.
                    222: Other built-in functions:
                    223: .TF length
                    224: .TP
                    225: .B length
                    226: the length of its argument
                    227: taken as a string,
                    228: or of
                    229: .B $0
                    230: if no argument.
                    231: .TP
                    232: .B rand
                    233: random number on (0,1)
                    234: .TP
                    235: .B srand
                    236: sets seed for
                    237: .B rand
                    238: and returns the previous seed.
                    239: .TP
                    240: .B int
                    241: truncates to an integer value
                    242: .TP
                    243: .BI substr( s , " m" , " n\fB)
                    244: the
                    245: .IR n -character
                    246: substring of
                    247: .I s
                    248: that begins at position
1.6     ! aaron     249: .IR m
1.1       tholo     250: counted from 1.
                    251: .TP
                    252: .BI index( s , " t" )
                    253: the position in
                    254: .I s
                    255: where the string
                    256: .I t
                    257: occurs, or 0 if it does not.
                    258: .TP
                    259: .BI match( s , " r" )
                    260: the position in
                    261: .I s
                    262: where the regular expression
                    263: .I r
                    264: occurs, or 0 if it does not.
                    265: The variables
                    266: .B RSTART
                    267: and
                    268: .B RLENGTH
                    269: are set to the position and length of the matched string.
                    270: .TP
                    271: .BI split( s , " a" , " fs\fB)
                    272: splits the string
                    273: .I s
                    274: into array elements
                    275: .IB a [1] ,
                    276: .IB a [2] ,
                    277: \&...,
                    278: .IB a [ n ] ,
                    279: and returns
                    280: .IR n .
                    281: The separation is done with the regular expression
                    282: .I fs
                    283: or with the field separator
                    284: .B FS
                    285: if
                    286: .I fs
                    287: is not given.
                    288: An empty string as field separator splits the string
                    289: into one array element per character.
                    290: .TP
                    291: .BI sub( r , " t" , " s\fB)
                    292: substitutes
                    293: .I t
                    294: for the first occurrence of the regular expression
                    295: .I r
                    296: in the string
                    297: .IR s .
                    298: If
                    299: .I s
                    300: is not given,
                    301: .B $0
                    302: is used.
                    303: .TP
                    304: .B gsub
                    305: same as
                    306: .B sub
                    307: except that all occurrences of the regular expression
                    308: are replaced;
                    309: .B sub
                    310: and
                    311: .B gsub
                    312: return the number of replacements.
                    313: .TP
                    314: .BI sprintf( fmt , " expr" , " ...\fB )
                    315: the string resulting from formatting
                    316: .I expr ...
                    317: according to the
                    318: .IR printf (3)
                    319: format
                    320: .I fmt
                    321: .TP
                    322: .BI system( cmd )
                    323: executes
                    324: .I cmd
                    325: and returns its exit status
                    326: .TP
                    327: .BI tolower( str )
                    328: returns a copy of
                    329: .I str
                    330: with all upper-case characters translated to their
                    331: corresponding lower-case equivalents.
                    332: .TP
                    333: .BI toupper( str )
                    334: returns a copy of
                    335: .I str
                    336: with all lower-case characters translated to their
                    337: corresponding upper-case equivalents.
                    338: .PD
                    339: .PP
                    340: The ``function''
                    341: .B getline
                    342: sets
                    343: .B $0
                    344: to the next input record from the current input file;
                    345: .B getline
                    346: .BI < file
                    347: sets
                    348: .B $0
                    349: to the next record from
                    350: .IR file .
                    351: .B getline
                    352: .I x
                    353: sets variable
                    354: .I x
                    355: instead.
                    356: Finally,
                    357: .IB cmd " | getline
                    358: pipes the output of
                    359: .I cmd
                    360: into
                    361: .BR getline ;
                    362: each call of
                    363: .B getline
                    364: returns the next line of output from
                    365: .IR cmd .
                    366: In all cases,
                    367: .B getline
                    368: returns 1 for a successful input,
                    369: 0 for end of file, and \-1 for an error.
                    370: .PP
                    371: Patterns are arbitrary Boolean combinations
                    372: (with
                    373: .BR "! || &&" )
                    374: of regular expressions and
                    375: relational expressions.
                    376: Regular expressions are as in
1.6     ! aaron     377: .IR egrep ;
1.1       tholo     378: see
                    379: .IR grep (1).
                    380: Isolated regular expressions
                    381: in a pattern apply to the entire line.
                    382: Regular expressions may also occur in
                    383: relational expressions, using the operators
                    384: .BR ~
                    385: and
                    386: .BR !~ .
                    387: .BI / re /
                    388: is a constant regular expression;
                    389: any string (constant or variable) may be used
                    390: as a regular expression, except in the position of an isolated regular expression
                    391: in a pattern.
                    392: .PP
                    393: A pattern may consist of two patterns separated by a comma;
                    394: in this case, the action is performed for all lines
                    395: from an occurrence of the first pattern
                    396: though an occurrence of the second.
                    397: .PP
                    398: A relational expression is one of the following:
                    399: .IP
                    400: .I expression matchop regular-expression
                    401: .br
                    402: .I expression relop expression
                    403: .br
                    404: .IB expression " in " array-name
                    405: .br
                    406: .BI ( expr , expr,... ") in " array-name
                    407: .PP
                    408: where a relop is any of the six relational operators in C,
                    409: and a matchop is either
                    410: .B ~
                    411: (matches)
                    412: or
                    413: .B !~
                    414: (does not match).
                    415: A conditional is an arithmetic expression,
                    416: a relational expression,
                    417: or a Boolean combination
                    418: of these.
                    419: .PP
                    420: The special patterns
                    421: .B BEGIN
                    422: and
                    423: .B END
                    424: may be used to capture control before the first input line is read
                    425: and after the last.
                    426: .B BEGIN
                    427: and
                    428: .B END
                    429: do not combine with other patterns.
                    430: .PP
                    431: Variable names with special meanings:
                    432: .TF FILENAME
                    433: .TP
                    434: .B CONVFMT
                    435: conversion format used when converting numbers
1.3       millert   436: (default
1.1       tholo     437: .BR "%.6g" )
                    438: .TP
                    439: .B FS
                    440: regular expression used to separate fields; also settable
                    441: by option
                    442: .BI \-F fs.
                    443: .TP
                    444: .BR NF
                    445: number of fields in the current record
                    446: .TP
                    447: .B NR
                    448: ordinal number of the current record
                    449: .TP
                    450: .B FNR
                    451: ordinal number of the current record in the current file
                    452: .TP
                    453: .B FILENAME
                    454: the name of the current input file
                    455: .TP
                    456: .B RS
                    457: input record separator (default newline)
                    458: .TP
                    459: .B OFS
                    460: output field separator (default blank)
                    461: .TP
                    462: .B ORS
                    463: output record separator (default newline)
                    464: .TP
                    465: .B OFMT
                    466: output format for numbers (default
                    467: .BR "%.6g" )
                    468: .TP
                    469: .B SUBSEP
                    470: separates multiple subscripts (default 034)
                    471: .TP
                    472: .B ARGC
                    473: argument count, assignable
                    474: .TP
                    475: .B ARGV
                    476: argument array, assignable;
                    477: non-null members are taken as filenames
                    478: .TP
                    479: .B ENVIRON
                    480: array of environment variables; subscripts are names.
                    481: .PD
                    482: .PP
                    483: Functions may be defined (at the position of a pattern-action statement) thus:
                    484: .IP
                    485: .B
                    486: function foo(a, b, c) { ...; return x }
                    487: .PP
                    488: Parameters are passed by value if scalar and by reference if array name;
                    489: functions may be called recursively.
                    490: Parameters are local to the function; all other variables are global.
                    491: Thus local variables may be created by providing excess parameters in
                    492: the function definition.
                    493: .SH EXAMPLES
                    494: .TP
1.3       millert   495: .EX
1.1       tholo     496: length($0) > 72
1.3       millert   497: .EE
1.1       tholo     498: Print lines longer than 72 characters.
                    499: .TP
1.3       millert   500: .EX
1.1       tholo     501: { print $2, $1 }
1.3       millert   502: .EE
1.1       tholo     503: Print first two fields in opposite order.
                    504: .PP
                    505: .EX
                    506: BEGIN { FS = ",[ \et]*|[ \et]+" }
                    507:       { print $2, $1 }
                    508: .EE
                    509: .ns
                    510: .IP
                    511: Same, with input fields separated by comma and/or blanks and tabs.
                    512: .PP
                    513: .EX
                    514: .nf
                    515:        { s += $1 }
                    516: END    { print "sum is", s, " average is", s/NR }
                    517: .fi
                    518: .EE
                    519: .ns
                    520: .IP
                    521: Add up first column, print sum and average.
                    522: .TP
1.3       millert   523: .EX
1.1       tholo     524: /start/, /stop/
1.3       millert   525: .EE
1.1       tholo     526: Print all lines between start/stop pairs.
                    527: .PP
                    528: .EX
                    529: .nf
                    530: BEGIN  {       # Simulate echo(1)
                    531:        for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
                    532:        printf "\en"
                    533:        exit }
                    534: .fi
                    535: .EE
                    536: .SH SEE ALSO
1.6     ! aaron     537: .IR lex (1),
1.1       tholo     538: .IR sed (1)
                    539: .br
                    540: A. V. Aho, B. W. Kernighan, P. J. Weinberger,
                    541: .I
                    542: The AWK Programming Language,
                    543: Addison-Wesley, 1988.  ISBN 0-201-07981-X
                    544: .SH BUGS
                    545: There are no explicit conversions between numbers and strings.
                    546: To force an expression to be treated as a number add 0 to it;
                    547: to force it to be treated as a string concatenate
                    548: \&\f(CW""\fP to it.
                    549: .br
                    550: The scope rules for variables in functions are a botch;
                    551: the syntax is worse.