Annotation of src/usr.bin/awk/awk.1, Revision 1.8
1.8 ! aaron 1: .\" $OpenBSD: awk.1,v 1.7 2000/08/30 13:37:51 aaron Exp $
1.7 aaron 2: .\" EX/EE is a Bd
3: .Dd June 29, 1996
4: .Dt AWK 1
5: .Os
6: .Sh NAME
7: .Nm awk
8: .Nd pattern-directed scanning and processing language
9: .Sh SYNOPSIS
10: .Nm awk
11: .Op Fl F Ar fs
12: .Op Fl v Ar var=value
13: .Op Fl safe
14: .Op Fl mr Ar n
15: .Op Fl mf Ar n
16: .Op Ar prog | Fl f Ar progfile
17: .Ar
18: .Nm nawk
19: .Ar ...
20: .Sh DESCRIPTION
21: .Nm
1.1 tholo 22: scans each input
1.7 aaron 23: .Ar file
1.1 tholo 24: for lines that match any of a set of patterns specified literally in
1.7 aaron 25: .Ar prog
1.1 tholo 26: or in one or more files
27: specified as
1.7 aaron 28: .Fl f Ar progfile .
1.1 tholo 29: With each pattern
30: there can be an associated action that will be performed
31: when a line of a
1.7 aaron 32: .Ar file
1.1 tholo 33: matches the pattern.
34: Each line is matched against the
35: pattern portion of every pattern-action statement;
36: the associated action is performed for each matched pattern.
1.6 aaron 37: The file name
1.7 aaron 38: .Sq Pa \-
1.1 tholo 39: means the standard input.
40: Any
1.7 aaron 41: .Ar file
1.1 tholo 42: of the form
1.7 aaron 43: .Ar var=value
1.1 tholo 44: is treated as an assignment, not a filename,
45: and is executed at the time it would have been opened if it were a filename.
46: The option
1.7 aaron 47: .Fl v
1.1 tholo 48: followed by
1.7 aaron 49: .Ar var=value
1.1 tholo 50: is an assignment to be done before
1.7 aaron 51: .Ar prog
1.1 tholo 52: is executed;
53: any number of
1.7 aaron 54: .Fl v
1.1 tholo 55: options may be present.
56: The
1.7 aaron 57: .Fl F Ar fs
1.1 tholo 58: option defines the input field separator to be the regular expression
1.7 aaron 59: .Ar fs .
1.5 angelos 60: The
1.7 aaron 61: .Fl safe
62: option disables file output
63: .Po
64: .Ic print Ic > ,
65: .Ic print Ic >> ,
66: .Pc
67: process creation
68: .Po
69: .Ar cmd Ic \&| getline ,
70: .Ic print \&| , system
71: .Pc
72: and access to the environment
73: .Pq Va ENVIRON .
74: This
75: is a first (and not very reliable) approximation to a
76: .Dq safe
77: version of
78: .Nm awk .
79: .Pp
80: An input line is normally made up of fields separated by whitespace,
1.1 tholo 81: or by regular expression
1.7 aaron 82: .Va FS .
1.1 tholo 83: The fields are denoted
1.7 aaron 84: .Va $1 , $2 , ... ,
85: while
86: .Va $0
1.1 tholo 87: refers to the entire line.
88: If
1.7 aaron 89: .Va FS
1.1 tholo 90: is null, the input line is split into one field per character.
1.7 aaron 91: .Pp
1.1 tholo 92: To compensate for inadequate implementation of storage management,
1.6 aaron 93: the
1.7 aaron 94: .Fl mr
1.1 tholo 95: option can be used to set the maximum size of the input record,
96: and the
1.7 aaron 97: .Fl mf
1.1 tholo 98: option to set the maximum number of fields.
1.7 aaron 99: .Pp
1.1 tholo 100: A pattern-action statement has the form
1.7 aaron 101: .Pp
102: .D1 Ar pattern Ic \&{ Ar action Ic \&}
103: .Pp
1.6 aaron 104: A missing
1.7 aaron 105: .Ic \&{ Ar action Ic \&}
1.1 tholo 106: means print the line;
107: a missing pattern always matches.
108: Pattern-action statements are separated by newlines or semicolons.
1.7 aaron 109: .Pp
1.1 tholo 110: An action is a sequence of statements.
111: A statement can be one of the following:
1.7 aaron 112: .Pp
113: .Bd -unfilled -offset indent
114: .Ic if ( Xo
115: .Ar expression ) statement \&
116: .Op Ic else Ar statement
117: .Xc
118: .Ic while ( Ar expression ) statement
119: .Ic for ( Xo
120: .Ar expression ; expression ; expression ) statement
121: .Xc
122: .Ic for ( Xo
123: .Ar var Ic in Ar array ) statement
124: .Xc
125: .Ic do Ar statement Ic while ( Ar expression )
126: .Ic break
127: .Ic continue
128: .Ic { Oo Ar statement ... Oc Ic \& }
129: .Ar expression Xo
130: .No "# commonly" \&
131: .Ar var Ic = Ar expression
132: .Xc
133: .Ic print Xo
134: .Op Ar expression-list
135: .Op Ic > Ns Ar expression
136: .Xc
137: .Ic printf Ar format Xo
138: .Op Ar ... , expression-list
139: .Op Ic > Ns Ar expression
140: .Xc
141: .Ic return Op Ar expression
142: .Ic next Xo
143: .No "# skip remaining patterns on this input line"
144: .Xc
145: .Ic nextfile Xo
146: .No "# skip rest of this file, open next, start at top"
147: .Xc
148: .Ic delete Ar array Ns Xo
149: .Ic \&[ Ns Ar expression Ns Ic \&]
150: .No \& "# delete an array element"
151: .Xc
152: .Ic delete Ar array Xo
153: .No "# delete all elements of array"
154: .Xc
155: .Ic exit Xo
156: .Op Ar expression
157: .No \& "# exit immediately; status is" Ar expression
158: .Xc
159: .Ed
160: .Pp
1.1 tholo 161: Statements are terminated by
162: semicolons, newlines or right braces.
163: An empty
1.7 aaron 164: .Ar expression-list
1.1 tholo 165: stands for
1.7 aaron 166: .Ar $0 .
167: String constants are quoted
168: .Li \&"" ,
1.1 tholo 169: with the usual C escapes recognized within.
170: Expressions take on string or numeric values as appropriate,
171: and are built using the operators
1.7 aaron 172: .Ic + \- * / % ^
173: (exponentiation), and concatenation (indicated by whitespace).
1.1 tholo 174: The operators
1.7 aaron 175: .Ic ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
1.1 tholo 176: are also available in expressions.
177: Variables may be scalars, array elements
178: (denoted
1.7 aaron 179: .Li x[i] )
1.1 tholo 180: or fields.
181: Variables are initialized to the null string.
182: Array subscripts may be any string,
183: not necessarily numeric;
184: this allows for a form of associative memory.
185: Multiple subscripts such as
1.7 aaron 186: .Li [i,j,k]
1.1 tholo 187: are permitted; the constituents are concatenated,
188: separated by the value of
1.7 aaron 189: .Va SUBSEP .
190: .Pp
1.1 tholo 191: The
1.7 aaron 192: .Ic print
1.1 tholo 193: statement prints its arguments on the standard output
194: (or on a file if
1.7 aaron 195: .Ic > Ns Ar file
1.1 tholo 196: or
1.7 aaron 197: .Ic >> Ns Ar file
1.1 tholo 198: is present or on a pipe if
1.7 aaron 199: .Ic \&| Ar cmd
1.1 tholo 200: is present), separated by the current output field separator,
201: and terminated by the output record separator.
1.7 aaron 202: .Ar file
1.1 tholo 203: and
1.7 aaron 204: .Ar cmd
1.1 tholo 205: may be literal names or parenthesized expressions;
206: identical string values in different statements denote
207: the same open file.
208: The
1.7 aaron 209: .Ic printf
1.1 tholo 210: statement formats its expression list according to the format
211: (see
1.7 aaron 212: .Xr printf 3 .
1.1 tholo 213: The built-in function
1.7 aaron 214: .Fn close expr
1.1 tholo 215: closes the file or pipe
1.7 aaron 216: .Fa expr .
1.1 tholo 217: The built-in function
1.7 aaron 218: .Fn fflush expr
1.1 tholo 219: flushes any buffered output for the file or pipe
1.7 aaron 220: .Fa expr .
221: .Pp
1.1 tholo 222: The mathematical functions
1.7 aaron 223: .Fn exp ,
224: .Fn log ,
225: .Fn sqrt ,
226: .Fn sin ,
227: .Fn cos ,
1.1 tholo 228: and
1.7 aaron 229: .Fn atan2
1.1 tholo 230: are built in.
231: Other built-in functions:
1.7 aaron 232: .Pp
233: .Bl -tag -width Fn
234: .It Fn length
1.1 tholo 235: the length of its argument
236: taken as a string,
237: or of
1.7 aaron 238: .Va $0
1.1 tholo 239: if no argument.
1.7 aaron 240: .It Fn rand
1.1 tholo 241: random number on (0,1)
1.7 aaron 242: .It Fn srand
1.1 tholo 243: sets seed for
1.7 aaron 244: .Fn rand
1.1 tholo 245: and returns the previous seed.
1.7 aaron 246: .It Fn int
247: truncates to an integer value.
248: .It Fn substr s m n
1.1 tholo 249: the
1.7 aaron 250: .Fa n Ns No -character
1.1 tholo 251: substring of
1.7 aaron 252: .Fa s
1.1 tholo 253: that begins at position
1.7 aaron 254: .Fa m
1.1 tholo 255: counted from 1.
1.7 aaron 256: .It Fn index s t
1.1 tholo 257: the position in
1.7 aaron 258: .Fa s
1.1 tholo 259: where the string
1.7 aaron 260: .Fa t
1.1 tholo 261: occurs, or 0 if it does not.
1.7 aaron 262: .It Fn match s r
1.1 tholo 263: the position in
1.7 aaron 264: .Fa s
1.1 tholo 265: where the regular expression
1.7 aaron 266: .Fa r
1.1 tholo 267: occurs, or 0 if it does not.
268: The variables
1.7 aaron 269: .Va RSTART
1.1 tholo 270: and
1.7 aaron 271: .Va RLENGTH
1.1 tholo 272: are set to the position and length of the matched string.
1.7 aaron 273: .It Fn split s a fs
1.1 tholo 274: splits the string
1.7 aaron 275: .Fa s
1.1 tholo 276: into array elements
1.7 aaron 277: .Va a[1] , a[2] , ... , a[n]
1.1 tholo 278: and returns
1.7 aaron 279: .Va n .
1.1 tholo 280: The separation is done with the regular expression
1.7 aaron 281: .Ar fs
1.1 tholo 282: or with the field separator
1.7 aaron 283: .Va FS
1.1 tholo 284: if
1.7 aaron 285: .Ar fs
1.1 tholo 286: is not given.
287: An empty string as field separator splits the string
288: into one array element per character.
1.7 aaron 289: .It Fn sub r t s
1.1 tholo 290: substitutes
1.7 aaron 291: .Fa t
1.1 tholo 292: for the first occurrence of the regular expression
1.7 aaron 293: .Fa r
1.1 tholo 294: in the string
1.7 aaron 295: .Fa s .
1.1 tholo 296: If
1.7 aaron 297: .Fa s
1.1 tholo 298: is not given,
1.7 aaron 299: .Va $0
1.1 tholo 300: is used.
1.7 aaron 301: .It Fn gsub r t s
1.1 tholo 302: same as
1.7 aaron 303: .Fn sub
1.1 tholo 304: except that all occurrences of the regular expression
305: are replaced;
1.7 aaron 306: .Fn sub
1.1 tholo 307: and
1.7 aaron 308: .Fn gsub
1.1 tholo 309: return the number of replacements.
1.7 aaron 310: .It Fn sprintf fmt expr ...
1.1 tholo 311: the string resulting from formatting
1.7 aaron 312: .Fa expr , ...
1.1 tholo 313: according to the
1.7 aaron 314: .Xr printf 3
1.1 tholo 315: format
1.7 aaron 316: .Fa fmt .
317: .It Fn system cmd
1.1 tholo 318: executes
1.7 aaron 319: .Fa cmd
320: and returns its exit status.
321: .It Fn tolower str
1.1 tholo 322: returns a copy of
1.7 aaron 323: .Fa str
1.1 tholo 324: with all upper-case characters translated to their
325: corresponding lower-case equivalents.
1.7 aaron 326: .It Fn toupper str
1.1 tholo 327: returns a copy of
1.7 aaron 328: .Fa str
1.1 tholo 329: with all lower-case characters translated to their
330: corresponding upper-case equivalents.
1.7 aaron 331: .El
332: .Pp
333: The
334: .Sq function
335: .Ic getline
1.1 tholo 336: sets
1.7 aaron 337: .Va $0
1.1 tholo 338: to the next input record from the current input file;
1.7 aaron 339: .Ic getline < Ar file
1.1 tholo 340: sets
1.7 aaron 341: .Va $0
1.1 tholo 342: to the next record from
1.7 aaron 343: .Ar file .
344: .Ic getline Va x
1.1 tholo 345: sets variable
1.7 aaron 346: .Va x
1.1 tholo 347: instead.
348: Finally,
1.7 aaron 349: .Ar cmd Ic \&| getline
1.1 tholo 350: pipes the output of
1.7 aaron 351: .Ar cmd
1.1 tholo 352: into
1.7 aaron 353: .Ic getline ;
1.1 tholo 354: each call of
1.7 aaron 355: .Ic getline
1.1 tholo 356: returns the next line of output from
1.7 aaron 357: .Ar cmd .
1.1 tholo 358: In all cases,
1.7 aaron 359: .Ic getline
1.1 tholo 360: returns 1 for a successful input,
361: 0 for end of file, and \-1 for an error.
1.7 aaron 362: .Pp
1.1 tholo 363: Patterns are arbitrary Boolean combinations
364: (with
1.7 aaron 365: .Ic "! || &&" )
1.1 tholo 366: of regular expressions and
367: relational expressions.
368: Regular expressions are as in
1.7 aaron 369: .Xr egrep 1 .
1.1 tholo 370: Isolated regular expressions
371: in a pattern apply to the entire line.
372: Regular expressions may also occur in
373: relational expressions, using the operators
1.7 aaron 374: .Ic ~
1.1 tholo 375: and
1.7 aaron 376: .Ic !~ .
377: .Ic / Ns Ar re Ns Ic /
1.1 tholo 378: is a constant regular expression;
379: any string (constant or variable) may be used
380: as a regular expression, except in the position of an isolated regular expression
381: in a pattern.
1.7 aaron 382: .Pp
1.1 tholo 383: A pattern may consist of two patterns separated by a comma;
384: in this case, the action is performed for all lines
385: from an occurrence of the first pattern
386: though an occurrence of the second.
1.7 aaron 387: .Pp
1.1 tholo 388: A relational expression is one of the following:
1.7 aaron 389: .Bd -unfilled -offset indent
390: .Ar expression matchop regular-expression
391: .Ar expression relop expression
392: .Ar expression Ic in Ar array-name
393: .Ic \&( Ns Xo
394: .Ar expr , expr , \&... Ns Ic \&) in
395: .Ar \& array-name
396: .Xc
397: .Ed
398: where a
399: .Ar relop
400: is any of the six relational operators in C, and a
401: .Ar matchop
402: is either
403: .Ic ~
1.1 tholo 404: (matches)
405: or
1.7 aaron 406: .Ic !~
1.1 tholo 407: (does not match).
408: A conditional is an arithmetic expression,
409: a relational expression,
410: or a Boolean combination
411: of these.
1.7 aaron 412: .Pp
1.1 tholo 413: The special patterns
1.7 aaron 414: .Ic BEGIN
1.1 tholo 415: and
1.7 aaron 416: .Ic END
1.1 tholo 417: may be used to capture control before the first input line is read
418: and after the last.
1.7 aaron 419: .Ic BEGIN
1.1 tholo 420: and
1.7 aaron 421: .Ic END
1.1 tholo 422: do not combine with other patterns.
1.7 aaron 423: .Pp
1.1 tholo 424: Variable names with special meanings:
1.7 aaron 425: .Pp
426: .Bl -tag -width Va -compact
427: .It Va CONVFMT
1.1 tholo 428: conversion format used when converting numbers
1.3 millert 429: (default
1.7 aaron 430: .Qq Li %.6g )
431: .It Va FS
1.1 tholo 432: regular expression used to separate fields; also settable
433: by option
1.7 aaron 434: .Fl fs .
435: .It Va NF
1.1 tholo 436: number of fields in the current record
1.7 aaron 437: .It Va NR
1.1 tholo 438: ordinal number of the current record
1.7 aaron 439: .It Va FNR
1.1 tholo 440: ordinal number of the current record in the current file
1.7 aaron 441: .It Va FILENAME
1.1 tholo 442: the name of the current input file
1.7 aaron 443: .It Va RS
1.1 tholo 444: input record separator (default newline)
1.7 aaron 445: .It Va OFS
1.1 tholo 446: output field separator (default blank)
1.7 aaron 447: .It Va ORS
1.1 tholo 448: output record separator (default newline)
1.7 aaron 449: .It Va OFMT
1.1 tholo 450: output format for numbers (default
1.7 aaron 451: .Qq Li %.6g )
452: .It Va SUBSEP
1.1 tholo 453: separates multiple subscripts (default 034)
1.7 aaron 454: .It Va ARGC
1.1 tholo 455: argument count, assignable
1.7 aaron 456: .It Va ARGV
1.1 tholo 457: argument array, assignable;
458: non-null members are taken as filenames
1.7 aaron 459: .It Va ENVIRON
1.1 tholo 460: array of environment variables; subscripts are names.
1.7 aaron 461: .El
462: .Pp
463: Functions may be defined (at the position of a pattern-action statement)
464: thusly:
465: .Pp
466: .Dl function foo(a, b, c) { ...; return x }
467: .Pp
1.1 tholo 468: Parameters are passed by value if scalar and by reference if array name;
469: functions may be called recursively.
470: Parameters are local to the function; all other variables are global.
471: Thus local variables may be created by providing excess parameters in
472: the function definition.
1.7 aaron 473: .Sh EXAMPLES
474: .Dl length($0) > 72
1.1 tholo 475: Print lines longer than 72 characters.
1.7 aaron 476: .Pp
477: .Dl { print $2, $1 }
1.1 tholo 478: Print first two fields in opposite order.
1.7 aaron 479: .Pp
480: .Bd -literal -offset indent
1.1 tholo 481: BEGIN { FS = ",[ \et]*|[ \et]+" }
482: { print $2, $1 }
1.7 aaron 483: .Ed
1.1 tholo 484: Same, with input fields separated by comma and/or blanks and tabs.
1.7 aaron 485: .Pp
486: .Bd -literal -offset indent
487: { s += $1 }
488: END { print "sum is", s, " average is", s/NR }
489: .Ed
1.1 tholo 490: Add up first column, print sum and average.
1.7 aaron 491: .Pp
492: .Dl /start/, /stop/
1.1 tholo 493: Print all lines between start/stop pairs.
1.7 aaron 494: .Pp
495: .Bd -literal -offset indent
496: BEGIN { # Simulate echo(1)
497: for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
498: printf "\en"
499: exit }
500: .Ed
501: .Sh SEE ALSO
502: .Xr lex 1 ,
503: .Xr sed 1
504: .Rs
505: .%A A. V. Aho
506: .%A B. W. Kernighan
507: .%A P. J. Weinberger
508: .%T The AWK Programming Language
509: .%I Addison-Wesley
510: .%D 1988
511: .%O ISBN 0-201-07981-X
512: .Re
1.8 ! aaron 513: .Sh HISTORY
! 514: AT&T
! 515: .Nm
! 516: by B. W. Kernighan was updated for
! 517: .Bx 4.4
! 518: and again in 1996.
1.7 aaron 519: .Sh BUGS
1.1 tholo 520: There are no explicit conversions between numbers and strings.
521: To force an expression to be treated as a number add 0 to it;
522: to force it to be treated as a string concatenate
1.7 aaron 523: .Li \&""
524: to it.
525: .Pp
1.1 tholo 526: The scope rules for variables in functions are a botch;
527: the syntax is worse.