Annotation of src/usr.bin/awk/awk.1, Revision 1.15
1.15 ! jmc 1: .\" $OpenBSD: awk.1,v 1.14 2003/09/02 18:50:06 jmc Exp $
1.7 aaron 2: .\" EX/EE is a Bd
1.11 jmc 3: .\"
4: .\" Copyright (C) Lucent Technologies 1997
5: .\" All Rights Reserved
1.12 jmc 6: .\"
1.11 jmc 7: .\" Permission to use, copy, modify, and distribute this software and
8: .\" its documentation for any purpose and without fee is hereby
9: .\" granted, provided that the above copyright notice appear in all
10: .\" copies and that both that the copyright notice and this
11: .\" permission notice and warranty disclaimer appear in supporting
12: .\" documentation, and that the name Lucent Technologies or any of
13: .\" its entities not be used in advertising or publicity pertaining
14: .\" to distribution of the software without specific, written prior
15: .\" permission.
1.12 jmc 16: .\"
1.11 jmc 17: .\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
18: .\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
19: .\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
20: .\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
21: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
22: .\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
23: .\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
24: .\" THIS SOFTWARE.
25: .\"
1.7 aaron 26: .Dd June 29, 1996
27: .Dt AWK 1
28: .Os
29: .Sh NAME
30: .Nm awk
31: .Nd pattern-directed scanning and processing language
32: .Sh SYNOPSIS
33: .Nm awk
34: .Op Fl F Ar fs
35: .Op Fl v Ar var=value
36: .Op Fl safe
37: .Op Fl mr Ar n
38: .Op Fl mf Ar n
39: .Op Ar prog | Fl f Ar progfile
40: .Ar
41: .Nm nawk
42: .Ar ...
43: .Sh DESCRIPTION
44: .Nm
1.1 tholo 45: scans each input
1.7 aaron 46: .Ar file
1.1 tholo 47: for lines that match any of a set of patterns specified literally in
1.7 aaron 48: .Ar prog
1.1 tholo 49: or in one or more files
50: specified as
1.7 aaron 51: .Fl f Ar progfile .
1.1 tholo 52: With each pattern
53: there can be an associated action that will be performed
54: when a line of a
1.7 aaron 55: .Ar file
1.1 tholo 56: matches the pattern.
57: Each line is matched against the
58: pattern portion of every pattern-action statement;
59: the associated action is performed for each matched pattern.
1.6 aaron 60: The file name
1.7 aaron 61: .Sq Pa \-
1.1 tholo 62: means the standard input.
63: Any
1.7 aaron 64: .Ar file
1.1 tholo 65: of the form
1.7 aaron 66: .Ar var=value
1.1 tholo 67: is treated as an assignment, not a filename,
68: and is executed at the time it would have been opened if it were a filename.
69: The option
1.7 aaron 70: .Fl v
1.1 tholo 71: followed by
1.7 aaron 72: .Ar var=value
1.1 tholo 73: is an assignment to be done before
1.7 aaron 74: .Ar prog
1.1 tholo 75: is executed;
76: any number of
1.7 aaron 77: .Fl v
1.1 tholo 78: options may be present.
79: The
1.7 aaron 80: .Fl F Ar fs
1.1 tholo 81: option defines the input field separator to be the regular expression
1.7 aaron 82: .Ar fs .
1.5 angelos 83: The
1.7 aaron 84: .Fl safe
85: option disables file output
86: .Po
87: .Ic print Ic > ,
88: .Ic print Ic >> ,
89: .Pc
90: process creation
91: .Po
92: .Ar cmd Ic \&| getline ,
93: .Ic print \&| , system
94: .Pc
95: and access to the environment
96: .Pq Va ENVIRON .
97: This
98: is a first (and not very reliable) approximation to a
99: .Dq safe
100: version of
101: .Nm awk .
102: .Pp
103: An input line is normally made up of fields separated by whitespace,
1.1 tholo 104: or by regular expression
1.7 aaron 105: .Va FS .
1.1 tholo 106: The fields are denoted
1.7 aaron 107: .Va $1 , $2 , ... ,
108: while
109: .Va $0
1.1 tholo 110: refers to the entire line.
111: If
1.7 aaron 112: .Va FS
1.1 tholo 113: is null, the input line is split into one field per character.
1.7 aaron 114: .Pp
1.1 tholo 115: To compensate for inadequate implementation of storage management,
1.6 aaron 116: the
1.7 aaron 117: .Fl mr
1.1 tholo 118: option can be used to set the maximum size of the input record,
119: and the
1.7 aaron 120: .Fl mf
1.1 tholo 121: option to set the maximum number of fields.
1.7 aaron 122: .Pp
1.1 tholo 123: A pattern-action statement has the form
1.7 aaron 124: .Pp
125: .D1 Ar pattern Ic \&{ Ar action Ic \&}
126: .Pp
1.6 aaron 127: A missing
1.7 aaron 128: .Ic \&{ Ar action Ic \&}
1.1 tholo 129: means print the line;
130: a missing pattern always matches.
131: Pattern-action statements are separated by newlines or semicolons.
1.7 aaron 132: .Pp
1.1 tholo 133: An action is a sequence of statements.
134: A statement can be one of the following:
1.7 aaron 135: .Bd -unfilled -offset indent
136: .Ic if ( Xo
137: .Ar expression ) statement \&
138: .Op Ic else Ar statement
139: .Xc
140: .Ic while ( Ar expression ) statement
141: .Ic for ( Xo
142: .Ar expression ; expression ; expression ) statement
143: .Xc
144: .Ic for ( Xo
145: .Ar var Ic in Ar array ) statement
146: .Xc
147: .Ic do Ar statement Ic while ( Ar expression )
148: .Ic break
149: .Ic continue
150: .Ic { Oo Ar statement ... Oc Ic \& }
151: .Ar expression Xo
152: .No "# commonly" \&
153: .Ar var Ic = Ar expression
154: .Xc
155: .Ic print Xo
156: .Op Ar expression-list
157: .Op Ic > Ns Ar expression
158: .Xc
159: .Ic printf Ar format Xo
160: .Op Ar ... , expression-list
161: .Op Ic > Ns Ar expression
162: .Xc
163: .Ic return Op Ar expression
164: .Ic next Xo
165: .No "# skip remaining patterns on this input line"
166: .Xc
167: .Ic nextfile Xo
168: .No "# skip rest of this file, open next, start at top"
169: .Xc
170: .Ic delete Ar array Ns Xo
171: .Ic \&[ Ns Ar expression Ns Ic \&]
172: .No \& "# delete an array element"
173: .Xc
174: .Ic delete Ar array Xo
175: .No "# delete all elements of array"
176: .Xc
177: .Ic exit Xo
178: .Op Ar expression
179: .No \& "# exit immediately; status is" Ar expression
180: .Xc
181: .Ed
182: .Pp
1.1 tholo 183: Statements are terminated by
184: semicolons, newlines or right braces.
185: An empty
1.7 aaron 186: .Ar expression-list
1.1 tholo 187: stands for
1.7 aaron 188: .Ar $0 .
189: String constants are quoted
190: .Li \&"" ,
1.1 tholo 191: with the usual C escapes recognized within.
192: Expressions take on string or numeric values as appropriate,
193: and are built using the operators
1.7 aaron 194: .Ic + \- * / % ^
195: (exponentiation), and concatenation (indicated by whitespace).
1.1 tholo 196: The operators
1.14 jmc 197: .Ic \&! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
1.1 tholo 198: are also available in expressions.
199: Variables may be scalars, array elements
200: (denoted
1.7 aaron 201: .Li x[i] )
1.1 tholo 202: or fields.
203: Variables are initialized to the null string.
204: Array subscripts may be any string,
205: not necessarily numeric;
206: this allows for a form of associative memory.
207: Multiple subscripts such as
1.7 aaron 208: .Li [i,j,k]
1.1 tholo 209: are permitted; the constituents are concatenated,
210: separated by the value of
1.7 aaron 211: .Va SUBSEP .
212: .Pp
1.1 tholo 213: The
1.7 aaron 214: .Ic print
1.1 tholo 215: statement prints its arguments on the standard output
216: (or on a file if
1.7 aaron 217: .Ic > Ns Ar file
1.1 tholo 218: or
1.7 aaron 219: .Ic >> Ns Ar file
1.1 tholo 220: is present or on a pipe if
1.7 aaron 221: .Ic \&| Ar cmd
1.1 tholo 222: is present), separated by the current output field separator,
223: and terminated by the output record separator.
1.7 aaron 224: .Ar file
1.1 tholo 225: and
1.7 aaron 226: .Ar cmd
1.1 tholo 227: may be literal names or parenthesized expressions;
228: identical string values in different statements denote
229: the same open file.
230: The
1.7 aaron 231: .Ic printf
1.1 tholo 232: statement formats its expression list according to the format
233: (see
1.10 pvalchev 234: .Xr printf 3 ) .
1.1 tholo 235: The built-in function
1.7 aaron 236: .Fn close expr
1.1 tholo 237: closes the file or pipe
1.7 aaron 238: .Fa expr .
1.1 tholo 239: The built-in function
1.7 aaron 240: .Fn fflush expr
1.1 tholo 241: flushes any buffered output for the file or pipe
1.7 aaron 242: .Fa expr .
243: .Pp
1.1 tholo 244: The mathematical functions
1.7 aaron 245: .Fn exp ,
246: .Fn log ,
247: .Fn sqrt ,
248: .Fn sin ,
249: .Fn cos ,
1.1 tholo 250: and
1.7 aaron 251: .Fn atan2
1.1 tholo 252: are built in.
253: Other built-in functions:
1.7 aaron 254: .Bl -tag -width Fn
255: .It Fn length
1.1 tholo 256: the length of its argument
257: taken as a string,
258: or of
1.7 aaron 259: .Va $0
1.1 tholo 260: if no argument.
1.7 aaron 261: .It Fn rand
1.1 tholo 262: random number on (0,1)
1.7 aaron 263: .It Fn srand
1.1 tholo 264: sets seed for
1.7 aaron 265: .Fn rand
1.1 tholo 266: and returns the previous seed.
1.7 aaron 267: .It Fn int
268: truncates to an integer value.
269: .It Fn substr s m n
1.1 tholo 270: the
1.7 aaron 271: .Fa n Ns No -character
1.1 tholo 272: substring of
1.7 aaron 273: .Fa s
1.1 tholo 274: that begins at position
1.7 aaron 275: .Fa m
1.1 tholo 276: counted from 1.
1.7 aaron 277: .It Fn index s t
1.1 tholo 278: the position in
1.7 aaron 279: .Fa s
1.1 tholo 280: where the string
1.7 aaron 281: .Fa t
1.1 tholo 282: occurs, or 0 if it does not.
1.7 aaron 283: .It Fn match s r
1.1 tholo 284: the position in
1.7 aaron 285: .Fa s
1.1 tholo 286: where the regular expression
1.7 aaron 287: .Fa r
1.1 tholo 288: occurs, or 0 if it does not.
289: The variables
1.7 aaron 290: .Va RSTART
1.1 tholo 291: and
1.7 aaron 292: .Va RLENGTH
1.1 tholo 293: are set to the position and length of the matched string.
1.7 aaron 294: .It Fn split s a fs
1.1 tholo 295: splits the string
1.7 aaron 296: .Fa s
1.1 tholo 297: into array elements
1.7 aaron 298: .Va a[1] , a[2] , ... , a[n]
1.1 tholo 299: and returns
1.7 aaron 300: .Va n .
1.1 tholo 301: The separation is done with the regular expression
1.7 aaron 302: .Ar fs
1.1 tholo 303: or with the field separator
1.7 aaron 304: .Va FS
1.1 tholo 305: if
1.7 aaron 306: .Ar fs
1.1 tholo 307: is not given.
308: An empty string as field separator splits the string
309: into one array element per character.
1.7 aaron 310: .It Fn sub r t s
1.1 tholo 311: substitutes
1.7 aaron 312: .Fa t
1.1 tholo 313: for the first occurrence of the regular expression
1.7 aaron 314: .Fa r
1.1 tholo 315: in the string
1.7 aaron 316: .Fa s .
1.1 tholo 317: If
1.7 aaron 318: .Fa s
1.1 tholo 319: is not given,
1.7 aaron 320: .Va $0
1.1 tholo 321: is used.
1.7 aaron 322: .It Fn gsub r t s
1.1 tholo 323: same as
1.7 aaron 324: .Fn sub
1.1 tholo 325: except that all occurrences of the regular expression
326: are replaced;
1.7 aaron 327: .Fn sub
1.1 tholo 328: and
1.7 aaron 329: .Fn gsub
1.1 tholo 330: return the number of replacements.
1.7 aaron 331: .It Fn sprintf fmt expr ...
1.1 tholo 332: the string resulting from formatting
1.7 aaron 333: .Fa expr , ...
1.1 tholo 334: according to the
1.7 aaron 335: .Xr printf 3
1.1 tholo 336: format
1.7 aaron 337: .Fa fmt .
338: .It Fn system cmd
1.1 tholo 339: executes
1.7 aaron 340: .Fa cmd
341: and returns its exit status.
342: .It Fn tolower str
1.1 tholo 343: returns a copy of
1.7 aaron 344: .Fa str
1.1 tholo 345: with all upper-case characters translated to their
346: corresponding lower-case equivalents.
1.7 aaron 347: .It Fn toupper str
1.1 tholo 348: returns a copy of
1.7 aaron 349: .Fa str
1.1 tholo 350: with all lower-case characters translated to their
351: corresponding upper-case equivalents.
1.7 aaron 352: .El
353: .Pp
354: The
355: .Sq function
356: .Ic getline
1.1 tholo 357: sets
1.7 aaron 358: .Va $0
1.1 tholo 359: to the next input record from the current input file;
1.7 aaron 360: .Ic getline < Ar file
1.1 tholo 361: sets
1.7 aaron 362: .Va $0
1.1 tholo 363: to the next record from
1.7 aaron 364: .Ar file .
365: .Ic getline Va x
1.1 tholo 366: sets variable
1.7 aaron 367: .Va x
1.1 tholo 368: instead.
369: Finally,
1.7 aaron 370: .Ar cmd Ic \&| getline
1.1 tholo 371: pipes the output of
1.7 aaron 372: .Ar cmd
1.1 tholo 373: into
1.7 aaron 374: .Ic getline ;
1.1 tholo 375: each call of
1.7 aaron 376: .Ic getline
1.1 tholo 377: returns the next line of output from
1.7 aaron 378: .Ar cmd .
1.1 tholo 379: In all cases,
1.7 aaron 380: .Ic getline
1.1 tholo 381: returns 1 for a successful input,
382: 0 for end of file, and \-1 for an error.
1.7 aaron 383: .Pp
1.1 tholo 384: Patterns are arbitrary Boolean combinations
385: (with
1.14 jmc 386: .Ic "\&! || &&" )
1.1 tholo 387: of regular expressions and
388: relational expressions.
389: Regular expressions are as in
1.12 jmc 390: .Xr egrep 1 .
1.1 tholo 391: Isolated regular expressions
392: in a pattern apply to the entire line.
393: Regular expressions may also occur in
394: relational expressions, using the operators
1.7 aaron 395: .Ic ~
1.1 tholo 396: and
1.7 aaron 397: .Ic !~ .
398: .Ic / Ns Ar re Ns Ic /
1.1 tholo 399: is a constant regular expression;
400: any string (constant or variable) may be used
401: as a regular expression, except in the position of an isolated regular expression
402: in a pattern.
1.7 aaron 403: .Pp
1.1 tholo 404: A pattern may consist of two patterns separated by a comma;
405: in this case, the action is performed for all lines
406: from an occurrence of the first pattern
1.15 ! jmc 407: through an occurrence of the second.
1.7 aaron 408: .Pp
1.1 tholo 409: A relational expression is one of the following:
1.7 aaron 410: .Bd -unfilled -offset indent
411: .Ar expression matchop regular-expression
412: .Ar expression relop expression
413: .Ar expression Ic in Ar array-name
414: .Ic \&( Ns Xo
415: .Ar expr , expr , \&... Ns Ic \&) in
416: .Ar \& array-name
417: .Xc
418: .Ed
1.15 ! jmc 419: .Pp
1.7 aaron 420: where a
421: .Ar relop
422: is any of the six relational operators in C, and a
423: .Ar matchop
424: is either
425: .Ic ~
1.1 tholo 426: (matches)
427: or
1.7 aaron 428: .Ic !~
1.1 tholo 429: (does not match).
430: A conditional is an arithmetic expression,
431: a relational expression,
432: or a Boolean combination
433: of these.
1.7 aaron 434: .Pp
1.1 tholo 435: The special patterns
1.7 aaron 436: .Ic BEGIN
1.1 tholo 437: and
1.7 aaron 438: .Ic END
1.1 tholo 439: may be used to capture control before the first input line is read
440: and after the last.
1.7 aaron 441: .Ic BEGIN
1.1 tholo 442: and
1.7 aaron 443: .Ic END
1.1 tholo 444: do not combine with other patterns.
1.7 aaron 445: .Pp
1.1 tholo 446: Variable names with special meanings:
1.7 aaron 447: .Pp
448: .Bl -tag -width Va -compact
449: .It Va CONVFMT
1.1 tholo 450: conversion format used when converting numbers
1.3 millert 451: (default
1.7 aaron 452: .Qq Li %.6g )
453: .It Va FS
1.1 tholo 454: regular expression used to separate fields; also settable
455: by option
1.9 millert 456: .Fl F Ar fs .
1.7 aaron 457: .It Va NF
1.1 tholo 458: number of fields in the current record
1.7 aaron 459: .It Va NR
1.1 tholo 460: ordinal number of the current record
1.7 aaron 461: .It Va FNR
1.1 tholo 462: ordinal number of the current record in the current file
1.7 aaron 463: .It Va FILENAME
1.1 tholo 464: the name of the current input file
1.7 aaron 465: .It Va RS
1.1 tholo 466: input record separator (default newline)
1.7 aaron 467: .It Va OFS
1.1 tholo 468: output field separator (default blank)
1.7 aaron 469: .It Va ORS
1.1 tholo 470: output record separator (default newline)
1.7 aaron 471: .It Va OFMT
1.1 tholo 472: output format for numbers (default
1.7 aaron 473: .Qq Li %.6g )
474: .It Va SUBSEP
1.1 tholo 475: separates multiple subscripts (default 034)
1.7 aaron 476: .It Va ARGC
1.1 tholo 477: argument count, assignable
1.7 aaron 478: .It Va ARGV
1.1 tholo 479: argument array, assignable;
480: non-null members are taken as filenames
1.7 aaron 481: .It Va ENVIRON
1.1 tholo 482: array of environment variables; subscripts are names.
1.7 aaron 483: .El
484: .Pp
485: Functions may be defined (at the position of a pattern-action statement)
486: thusly:
487: .Pp
488: .Dl function foo(a, b, c) { ...; return x }
489: .Pp
1.1 tholo 490: Parameters are passed by value if scalar and by reference if array name;
491: functions may be called recursively.
492: Parameters are local to the function; all other variables are global.
493: Thus local variables may be created by providing excess parameters in
494: the function definition.
1.7 aaron 495: .Sh EXAMPLES
496: .Dl length($0) > 72
1.1 tholo 497: Print lines longer than 72 characters.
1.7 aaron 498: .Pp
499: .Dl { print $2, $1 }
1.1 tholo 500: Print first two fields in opposite order.
1.7 aaron 501: .Bd -literal -offset indent
1.1 tholo 502: BEGIN { FS = ",[ \et]*|[ \et]+" }
503: { print $2, $1 }
1.7 aaron 504: .Ed
1.1 tholo 505: Same, with input fields separated by comma and/or blanks and tabs.
1.7 aaron 506: .Bd -literal -offset indent
507: { s += $1 }
508: END { print "sum is", s, " average is", s/NR }
509: .Ed
1.1 tholo 510: Add up first column, print sum and average.
1.7 aaron 511: .Pp
512: .Dl /start/, /stop/
1.1 tholo 513: Print all lines between start/stop pairs.
1.7 aaron 514: .Bd -literal -offset indent
515: BEGIN { # Simulate echo(1)
516: for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
517: printf "\en"
518: exit }
519: .Ed
520: .Sh SEE ALSO
521: .Xr lex 1 ,
522: .Xr sed 1
523: .Rs
524: .%A A. V. Aho
525: .%A B. W. Kernighan
526: .%A P. J. Weinberger
527: .%T The AWK Programming Language
528: .%I Addison-Wesley
529: .%D 1988
530: .%O ISBN 0-201-07981-X
531: .Re
1.8 aaron 532: .Sh HISTORY
1.13 millert 533: An
1.8 aaron 534: .Nm
1.13 millert 535: utility appeared in
536: .At v7 .
1.7 aaron 537: .Sh BUGS
1.1 tholo 538: There are no explicit conversions between numbers and strings.
539: To force an expression to be treated as a number add 0 to it;
540: to force it to be treated as a string concatenate
1.7 aaron 541: .Li \&""
542: to it.
543: .Pp
1.1 tholo 544: The scope rules for variables in functions are a botch;
545: the syntax is worse.