Annotation of src/usr.bin/awk/awk.1, Revision 1.6
1.6 ! aaron 1: .\" $OpenBSD: awk.1,v 1.5 1998/03/03 01:56:00 angelos Exp $
1.1 tholo 2: .de EX
3: .nf
4: .ft CW
5: ..
6: .de EE
7: .br
8: .fi
9: .ft 1
10: ..
11: .TH AWK 1
12: .CT 1 files prog_other
13: .SH NAME
14: awk \- pattern-directed scanning and processing language
15: .SH SYNOPSIS
1.2 etheisen 16: .B awk|nawk
1.1 tholo 17: [
18: .BI \-F
19: .I fs
20: ]
21: [
22: .BI \-v
23: .I var=value
24: ]
25: [
1.5 angelos 26: .BI \-safe
27: ]
28: [
1.1 tholo 29: .BI \-mr n
30: ]
31: [
32: .BI \-mf n
33: ]
34: [
35: .I 'prog'
36: |
37: .BI \-f
38: .I progfile
39: ]
40: [
41: .I file ...
42: ]
43: .SH DESCRIPTION
44: .I Awk
45: scans each input
46: .I file
47: for lines that match any of a set of patterns specified literally in
48: .IR prog
49: or in one or more files
50: specified as
51: .B \-f
52: .IR progfile .
53: With each pattern
54: there can be an associated action that will be performed
55: when a line of a
56: .I file
57: matches the pattern.
58: Each line is matched against the
59: pattern portion of every pattern-action statement;
60: the associated action is performed for each matched pattern.
1.6 ! aaron 61: The file name
1.1 tholo 62: .B \-
63: means the standard input.
64: Any
65: .IR file
66: of the form
67: .I var=value
68: is treated as an assignment, not a filename,
69: and is executed at the time it would have been opened if it were a filename.
70: The option
71: .B \-v
72: followed by
73: .I var=value
74: is an assignment to be done before
75: .I prog
76: is executed;
77: any number of
78: .B \-v
79: options may be present.
80: The
81: .B \-F
82: .IR fs
83: option defines the input field separator to be the regular expression
84: .IR fs.
1.5 angelos 85: The
86: .B \-safe
87: option disables file output (print >, print >>), process creation
88: (cmd|getline, print |, system), and access to the environment (ENVIRON). This
89: is a first (and not very reliable) approximation to a "safe" version of awk.
1.1 tholo 90: .PP
91: An input line is normally made up of fields separated by white space,
92: or by regular expression
93: .BR FS .
94: The fields are denoted
95: .BR $1 ,
96: .BR $2 ,
97: \&..., while
98: .B $0
99: refers to the entire line.
100: If
101: .BR FS
102: is null, the input line is split into one field per character.
103: .PP
104: To compensate for inadequate implementation of storage management,
1.6 ! aaron 105: the
1.1 tholo 106: .B \-mr
107: option can be used to set the maximum size of the input record,
108: and the
109: .B \-mf
110: option to set the maximum number of fields.
111: .PP
112: A pattern-action statement has the form
113: .IP
114: .IB pattern " { " action " }
115: .PP
1.6 ! aaron 116: A missing
1.1 tholo 117: .BI { " action " }
118: means print the line;
119: a missing pattern always matches.
120: Pattern-action statements are separated by newlines or semicolons.
121: .PP
122: An action is a sequence of statements.
123: A statement can be one of the following:
124: .PP
125: .EX
126: .ta \w'\f(CWdelete array[expression]'u
127: .RS
128: .nf
129: .ft CW
130: if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
131: while(\fI expression \fP)\fI statement\fP
132: for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
133: for(\fI var \fPin\fI array \fP)\fI statement\fP
134: do\fI statement \fPwhile(\fI expression \fP)
135: break
136: continue
137: {\fR [\fP\fI statement ... \fP\fR] \fP}
138: \fIexpression\fP #\fR commonly\fP\fI var = expression\fP
139: print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
140: printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
141: return\fR [ \fP\fIexpression \fP\fR]\fP
142: next #\fR skip remaining patterns on this input line\fP
143: nextfile #\fR skip rest of this file, open next, start at top\fP
144: delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
145: delete\fI array\fP #\fR delete all elements of array\fP
146: exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
147: .fi
148: .RE
149: .EE
150: .DT
151: .PP
152: Statements are terminated by
153: semicolons, newlines or right braces.
154: An empty
155: .I expression-list
156: stands for
157: .BR $0 .
158: String constants are quoted \&\f(CW"\ "\fR,
159: with the usual C escapes recognized within.
160: Expressions take on string or numeric values as appropriate,
161: and are built using the operators
162: .B + \- * / % ^
163: (exponentiation), and concatenation (indicated by white space).
164: The operators
165: .B
166: ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
167: are also available in expressions.
168: Variables may be scalars, array elements
169: (denoted
170: .IB x [ i ] )
171: or fields.
172: Variables are initialized to the null string.
173: Array subscripts may be any string,
174: not necessarily numeric;
175: this allows for a form of associative memory.
176: Multiple subscripts such as
177: .B [i,j,k]
178: are permitted; the constituents are concatenated,
179: separated by the value of
180: .BR SUBSEP .
181: .PP
182: The
183: .B print
184: statement prints its arguments on the standard output
185: (or on a file if
186: .BI > file
187: or
188: .BI >> file
189: is present or on a pipe if
190: .BI | cmd
191: is present), separated by the current output field separator,
192: and terminated by the output record separator.
193: .I file
194: and
195: .I cmd
196: may be literal names or parenthesized expressions;
197: identical string values in different statements denote
198: the same open file.
199: The
200: .B printf
201: statement formats its expression list according to the format
202: (see
203: .IR printf (3)) .
204: The built-in function
205: .BI close( expr )
206: closes the file or pipe
207: .IR expr .
208: The built-in function
209: .BI fflush( expr )
210: flushes any buffered output for the file or pipe
211: .IR expr .
212: .PP
213: The mathematical functions
214: .BR exp ,
215: .BR log ,
216: .BR sqrt ,
217: .BR sin ,
218: .BR cos ,
219: and
1.6 ! aaron 220: .BR atan2
1.1 tholo 221: are built in.
222: Other built-in functions:
223: .TF length
224: .TP
225: .B length
226: the length of its argument
227: taken as a string,
228: or of
229: .B $0
230: if no argument.
231: .TP
232: .B rand
233: random number on (0,1)
234: .TP
235: .B srand
236: sets seed for
237: .B rand
238: and returns the previous seed.
239: .TP
240: .B int
241: truncates to an integer value
242: .TP
243: .BI substr( s , " m" , " n\fB)
244: the
245: .IR n -character
246: substring of
247: .I s
248: that begins at position
1.6 ! aaron 249: .IR m
1.1 tholo 250: counted from 1.
251: .TP
252: .BI index( s , " t" )
253: the position in
254: .I s
255: where the string
256: .I t
257: occurs, or 0 if it does not.
258: .TP
259: .BI match( s , " r" )
260: the position in
261: .I s
262: where the regular expression
263: .I r
264: occurs, or 0 if it does not.
265: The variables
266: .B RSTART
267: and
268: .B RLENGTH
269: are set to the position and length of the matched string.
270: .TP
271: .BI split( s , " a" , " fs\fB)
272: splits the string
273: .I s
274: into array elements
275: .IB a [1] ,
276: .IB a [2] ,
277: \&...,
278: .IB a [ n ] ,
279: and returns
280: .IR n .
281: The separation is done with the regular expression
282: .I fs
283: or with the field separator
284: .B FS
285: if
286: .I fs
287: is not given.
288: An empty string as field separator splits the string
289: into one array element per character.
290: .TP
291: .BI sub( r , " t" , " s\fB)
292: substitutes
293: .I t
294: for the first occurrence of the regular expression
295: .I r
296: in the string
297: .IR s .
298: If
299: .I s
300: is not given,
301: .B $0
302: is used.
303: .TP
304: .B gsub
305: same as
306: .B sub
307: except that all occurrences of the regular expression
308: are replaced;
309: .B sub
310: and
311: .B gsub
312: return the number of replacements.
313: .TP
314: .BI sprintf( fmt , " expr" , " ...\fB )
315: the string resulting from formatting
316: .I expr ...
317: according to the
318: .IR printf (3)
319: format
320: .I fmt
321: .TP
322: .BI system( cmd )
323: executes
324: .I cmd
325: and returns its exit status
326: .TP
327: .BI tolower( str )
328: returns a copy of
329: .I str
330: with all upper-case characters translated to their
331: corresponding lower-case equivalents.
332: .TP
333: .BI toupper( str )
334: returns a copy of
335: .I str
336: with all lower-case characters translated to their
337: corresponding upper-case equivalents.
338: .PD
339: .PP
340: The ``function''
341: .B getline
342: sets
343: .B $0
344: to the next input record from the current input file;
345: .B getline
346: .BI < file
347: sets
348: .B $0
349: to the next record from
350: .IR file .
351: .B getline
352: .I x
353: sets variable
354: .I x
355: instead.
356: Finally,
357: .IB cmd " | getline
358: pipes the output of
359: .I cmd
360: into
361: .BR getline ;
362: each call of
363: .B getline
364: returns the next line of output from
365: .IR cmd .
366: In all cases,
367: .B getline
368: returns 1 for a successful input,
369: 0 for end of file, and \-1 for an error.
370: .PP
371: Patterns are arbitrary Boolean combinations
372: (with
373: .BR "! || &&" )
374: of regular expressions and
375: relational expressions.
376: Regular expressions are as in
1.6 ! aaron 377: .IR egrep ;
1.1 tholo 378: see
379: .IR grep (1).
380: Isolated regular expressions
381: in a pattern apply to the entire line.
382: Regular expressions may also occur in
383: relational expressions, using the operators
384: .BR ~
385: and
386: .BR !~ .
387: .BI / re /
388: is a constant regular expression;
389: any string (constant or variable) may be used
390: as a regular expression, except in the position of an isolated regular expression
391: in a pattern.
392: .PP
393: A pattern may consist of two patterns separated by a comma;
394: in this case, the action is performed for all lines
395: from an occurrence of the first pattern
396: though an occurrence of the second.
397: .PP
398: A relational expression is one of the following:
399: .IP
400: .I expression matchop regular-expression
401: .br
402: .I expression relop expression
403: .br
404: .IB expression " in " array-name
405: .br
406: .BI ( expr , expr,... ") in " array-name
407: .PP
408: where a relop is any of the six relational operators in C,
409: and a matchop is either
410: .B ~
411: (matches)
412: or
413: .B !~
414: (does not match).
415: A conditional is an arithmetic expression,
416: a relational expression,
417: or a Boolean combination
418: of these.
419: .PP
420: The special patterns
421: .B BEGIN
422: and
423: .B END
424: may be used to capture control before the first input line is read
425: and after the last.
426: .B BEGIN
427: and
428: .B END
429: do not combine with other patterns.
430: .PP
431: Variable names with special meanings:
432: .TF FILENAME
433: .TP
434: .B CONVFMT
435: conversion format used when converting numbers
1.3 millert 436: (default
1.1 tholo 437: .BR "%.6g" )
438: .TP
439: .B FS
440: regular expression used to separate fields; also settable
441: by option
442: .BI \-F fs.
443: .TP
444: .BR NF
445: number of fields in the current record
446: .TP
447: .B NR
448: ordinal number of the current record
449: .TP
450: .B FNR
451: ordinal number of the current record in the current file
452: .TP
453: .B FILENAME
454: the name of the current input file
455: .TP
456: .B RS
457: input record separator (default newline)
458: .TP
459: .B OFS
460: output field separator (default blank)
461: .TP
462: .B ORS
463: output record separator (default newline)
464: .TP
465: .B OFMT
466: output format for numbers (default
467: .BR "%.6g" )
468: .TP
469: .B SUBSEP
470: separates multiple subscripts (default 034)
471: .TP
472: .B ARGC
473: argument count, assignable
474: .TP
475: .B ARGV
476: argument array, assignable;
477: non-null members are taken as filenames
478: .TP
479: .B ENVIRON
480: array of environment variables; subscripts are names.
481: .PD
482: .PP
483: Functions may be defined (at the position of a pattern-action statement) thus:
484: .IP
485: .B
486: function foo(a, b, c) { ...; return x }
487: .PP
488: Parameters are passed by value if scalar and by reference if array name;
489: functions may be called recursively.
490: Parameters are local to the function; all other variables are global.
491: Thus local variables may be created by providing excess parameters in
492: the function definition.
493: .SH EXAMPLES
494: .TP
1.3 millert 495: .EX
1.1 tholo 496: length($0) > 72
1.3 millert 497: .EE
1.1 tholo 498: Print lines longer than 72 characters.
499: .TP
1.3 millert 500: .EX
1.1 tholo 501: { print $2, $1 }
1.3 millert 502: .EE
1.1 tholo 503: Print first two fields in opposite order.
504: .PP
505: .EX
506: BEGIN { FS = ",[ \et]*|[ \et]+" }
507: { print $2, $1 }
508: .EE
509: .ns
510: .IP
511: Same, with input fields separated by comma and/or blanks and tabs.
512: .PP
513: .EX
514: .nf
515: { s += $1 }
516: END { print "sum is", s, " average is", s/NR }
517: .fi
518: .EE
519: .ns
520: .IP
521: Add up first column, print sum and average.
522: .TP
1.3 millert 523: .EX
1.1 tholo 524: /start/, /stop/
1.3 millert 525: .EE
1.1 tholo 526: Print all lines between start/stop pairs.
527: .PP
528: .EX
529: .nf
530: BEGIN { # Simulate echo(1)
531: for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
532: printf "\en"
533: exit }
534: .fi
535: .EE
536: .SH SEE ALSO
1.6 ! aaron 537: .IR lex (1),
1.1 tholo 538: .IR sed (1)
539: .br
540: A. V. Aho, B. W. Kernighan, P. J. Weinberger,
541: .I
542: The AWK Programming Language,
543: Addison-Wesley, 1988. ISBN 0-201-07981-X
544: .SH BUGS
545: There are no explicit conversions between numbers and strings.
546: To force an expression to be treated as a number add 0 to it;
547: to force it to be treated as a string concatenate
548: \&\f(CW""\fP to it.
549: .br
550: The scope rules for variables in functions are a botch;
551: the syntax is worse.