Annotation of src/usr.bin/awk/awk.1, Revision 1.2
1.2 ! etheisen 1: .\" $OpenBSD$
1.1 tholo 2: .de EX
3: .nf
4: .ft CW
5: ..
6: .de EE
7: .br
8: .fi
9: .ft 1
10: ..
11: awk
12: .TH AWK 1
13: .CT 1 files prog_other
14: .SH NAME
15: awk \- pattern-directed scanning and processing language
16: .SH SYNOPSIS
1.2 ! etheisen 17: .B awk|nawk
1.1 tholo 18: [
19: .BI \-F
20: .I fs
21: ]
22: [
23: .BI \-v
24: .I var=value
25: ]
26: [
27: .BI \-mr n
28: ]
29: [
30: .BI \-mf n
31: ]
32: [
33: .I 'prog'
34: |
35: .BI \-f
36: .I progfile
37: ]
38: [
39: .I file ...
40: ]
41: .SH DESCRIPTION
42: .I Awk
43: scans each input
44: .I file
45: for lines that match any of a set of patterns specified literally in
46: .IR prog
47: or in one or more files
48: specified as
49: .B \-f
50: .IR progfile .
51: With each pattern
52: there can be an associated action that will be performed
53: when a line of a
54: .I file
55: matches the pattern.
56: Each line is matched against the
57: pattern portion of every pattern-action statement;
58: the associated action is performed for each matched pattern.
59: The file name
60: .B \-
61: means the standard input.
62: Any
63: .IR file
64: of the form
65: .I var=value
66: is treated as an assignment, not a filename,
67: and is executed at the time it would have been opened if it were a filename.
68: The option
69: .B \-v
70: followed by
71: .I var=value
72: is an assignment to be done before
73: .I prog
74: is executed;
75: any number of
76: .B \-v
77: options may be present.
78: The
79: .B \-F
80: .IR fs
81: option defines the input field separator to be the regular expression
82: .IR fs.
83: .PP
84: An input line is normally made up of fields separated by white space,
85: or by regular expression
86: .BR FS .
87: The fields are denoted
88: .BR $1 ,
89: .BR $2 ,
90: \&..., while
91: .B $0
92: refers to the entire line.
93: If
94: .BR FS
95: is null, the input line is split into one field per character.
96: .PP
97: To compensate for inadequate implementation of storage management,
98: the
99: .B \-mr
100: option can be used to set the maximum size of the input record,
101: and the
102: .B \-mf
103: option to set the maximum number of fields.
104: .PP
105: A pattern-action statement has the form
106: .IP
107: .IB pattern " { " action " }
108: .PP
109: A missing
110: .BI { " action " }
111: means print the line;
112: a missing pattern always matches.
113: Pattern-action statements are separated by newlines or semicolons.
114: .PP
115: An action is a sequence of statements.
116: A statement can be one of the following:
117: .PP
118: .EX
119: .ta \w'\f(CWdelete array[expression]'u
120: .RS
121: .nf
122: .ft CW
123: if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
124: while(\fI expression \fP)\fI statement\fP
125: for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
126: for(\fI var \fPin\fI array \fP)\fI statement\fP
127: do\fI statement \fPwhile(\fI expression \fP)
128: break
129: continue
130: {\fR [\fP\fI statement ... \fP\fR] \fP}
131: \fIexpression\fP #\fR commonly\fP\fI var = expression\fP
132: print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
133: printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
134: return\fR [ \fP\fIexpression \fP\fR]\fP
135: next #\fR skip remaining patterns on this input line\fP
136: nextfile #\fR skip rest of this file, open next, start at top\fP
137: delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
138: delete\fI array\fP #\fR delete all elements of array\fP
139: exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
140: .fi
141: .RE
142: .EE
143: .DT
144: .PP
145: Statements are terminated by
146: semicolons, newlines or right braces.
147: An empty
148: .I expression-list
149: stands for
150: .BR $0 .
151: String constants are quoted \&\f(CW"\ "\fR,
152: with the usual C escapes recognized within.
153: Expressions take on string or numeric values as appropriate,
154: and are built using the operators
155: .B + \- * / % ^
156: (exponentiation), and concatenation (indicated by white space).
157: The operators
158: .B
159: ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
160: are also available in expressions.
161: Variables may be scalars, array elements
162: (denoted
163: .IB x [ i ] )
164: or fields.
165: Variables are initialized to the null string.
166: Array subscripts may be any string,
167: not necessarily numeric;
168: this allows for a form of associative memory.
169: Multiple subscripts such as
170: .B [i,j,k]
171: are permitted; the constituents are concatenated,
172: separated by the value of
173: .BR SUBSEP .
174: .PP
175: The
176: .B print
177: statement prints its arguments on the standard output
178: (or on a file if
179: .BI > file
180: or
181: .BI >> file
182: is present or on a pipe if
183: .BI | cmd
184: is present), separated by the current output field separator,
185: and terminated by the output record separator.
186: .I file
187: and
188: .I cmd
189: may be literal names or parenthesized expressions;
190: identical string values in different statements denote
191: the same open file.
192: The
193: .B printf
194: statement formats its expression list according to the format
195: (see
196: .IR printf (3)) .
197: The built-in function
198: .BI close( expr )
199: closes the file or pipe
200: .IR expr .
201: The built-in function
202: .BI fflush( expr )
203: flushes any buffered output for the file or pipe
204: .IR expr .
205: .PP
206: The mathematical functions
207: .BR exp ,
208: .BR log ,
209: .BR sqrt ,
210: .BR sin ,
211: .BR cos ,
212: and
213: .BR atan2
214: are built in.
215: Other built-in functions:
216: .TF length
217: .TP
218: .B length
219: the length of its argument
220: taken as a string,
221: or of
222: .B $0
223: if no argument.
224: .TP
225: .B rand
226: random number on (0,1)
227: .TP
228: .B srand
229: sets seed for
230: .B rand
231: and returns the previous seed.
232: .TP
233: .B int
234: truncates to an integer value
235: .TP
236: .BI substr( s , " m" , " n\fB)
237: the
238: .IR n -character
239: substring of
240: .I s
241: that begins at position
242: .IR m
243: counted from 1.
244: .TP
245: .BI index( s , " t" )
246: the position in
247: .I s
248: where the string
249: .I t
250: occurs, or 0 if it does not.
251: .TP
252: .BI match( s , " r" )
253: the position in
254: .I s
255: where the regular expression
256: .I r
257: occurs, or 0 if it does not.
258: The variables
259: .B RSTART
260: and
261: .B RLENGTH
262: are set to the position and length of the matched string.
263: .TP
264: .BI split( s , " a" , " fs\fB)
265: splits the string
266: .I s
267: into array elements
268: .IB a [1] ,
269: .IB a [2] ,
270: \&...,
271: .IB a [ n ] ,
272: and returns
273: .IR n .
274: The separation is done with the regular expression
275: .I fs
276: or with the field separator
277: .B FS
278: if
279: .I fs
280: is not given.
281: An empty string as field separator splits the string
282: into one array element per character.
283: .TP
284: .BI sub( r , " t" , " s\fB)
285: substitutes
286: .I t
287: for the first occurrence of the regular expression
288: .I r
289: in the string
290: .IR s .
291: If
292: .I s
293: is not given,
294: .B $0
295: is used.
296: .TP
297: .B gsub
298: same as
299: .B sub
300: except that all occurrences of the regular expression
301: are replaced;
302: .B sub
303: and
304: .B gsub
305: return the number of replacements.
306: .TP
307: .BI sprintf( fmt , " expr" , " ...\fB )
308: the string resulting from formatting
309: .I expr ...
310: according to the
311: .IR printf (3)
312: format
313: .I fmt
314: .TP
315: .BI system( cmd )
316: executes
317: .I cmd
318: and returns its exit status
319: .TP
320: .BI tolower( str )
321: returns a copy of
322: .I str
323: with all upper-case characters translated to their
324: corresponding lower-case equivalents.
325: .TP
326: .BI toupper( str )
327: returns a copy of
328: .I str
329: with all lower-case characters translated to their
330: corresponding upper-case equivalents.
331: .PD
332: .PP
333: The ``function''
334: .B getline
335: sets
336: .B $0
337: to the next input record from the current input file;
338: .B getline
339: .BI < file
340: sets
341: .B $0
342: to the next record from
343: .IR file .
344: .B getline
345: .I x
346: sets variable
347: .I x
348: instead.
349: Finally,
350: .IB cmd " | getline
351: pipes the output of
352: .I cmd
353: into
354: .BR getline ;
355: each call of
356: .B getline
357: returns the next line of output from
358: .IR cmd .
359: In all cases,
360: .B getline
361: returns 1 for a successful input,
362: 0 for end of file, and \-1 for an error.
363: .PP
364: Patterns are arbitrary Boolean combinations
365: (with
366: .BR "! || &&" )
367: of regular expressions and
368: relational expressions.
369: Regular expressions are as in
370: .IR egrep ;
371: see
372: .IR grep (1).
373: Isolated regular expressions
374: in a pattern apply to the entire line.
375: Regular expressions may also occur in
376: relational expressions, using the operators
377: .BR ~
378: and
379: .BR !~ .
380: .BI / re /
381: is a constant regular expression;
382: any string (constant or variable) may be used
383: as a regular expression, except in the position of an isolated regular expression
384: in a pattern.
385: .PP
386: A pattern may consist of two patterns separated by a comma;
387: in this case, the action is performed for all lines
388: from an occurrence of the first pattern
389: though an occurrence of the second.
390: .PP
391: A relational expression is one of the following:
392: .IP
393: .I expression matchop regular-expression
394: .br
395: .I expression relop expression
396: .br
397: .IB expression " in " array-name
398: .br
399: .BI ( expr , expr,... ") in " array-name
400: .PP
401: where a relop is any of the six relational operators in C,
402: and a matchop is either
403: .B ~
404: (matches)
405: or
406: .B !~
407: (does not match).
408: A conditional is an arithmetic expression,
409: a relational expression,
410: or a Boolean combination
411: of these.
412: .PP
413: The special patterns
414: .B BEGIN
415: and
416: .B END
417: may be used to capture control before the first input line is read
418: and after the last.
419: .B BEGIN
420: and
421: .B END
422: do not combine with other patterns.
423: .PP
424: Variable names with special meanings:
425: .TF FILENAME
426: .TP
427: .B CONVFMT
428: conversion format used when converting numbers
429: .BR "%.6g" )
430: .TP
431: .B FS
432: regular expression used to separate fields; also settable
433: by option
434: .BI \-F fs.
435: .TP
436: .BR NF
437: number of fields in the current record
438: .TP
439: .B NR
440: ordinal number of the current record
441: .TP
442: .B FNR
443: ordinal number of the current record in the current file
444: .TP
445: .B FILENAME
446: the name of the current input file
447: .TP
448: .B RS
449: input record separator (default newline)
450: .TP
451: .B OFS
452: output field separator (default blank)
453: .TP
454: .B ORS
455: output record separator (default newline)
456: .TP
457: .B OFMT
458: output format for numbers (default
459: .BR "%.6g" )
460: .TP
461: .B SUBSEP
462: separates multiple subscripts (default 034)
463: .TP
464: .B ARGC
465: argument count, assignable
466: .TP
467: .B ARGV
468: argument array, assignable;
469: non-null members are taken as filenames
470: .TP
471: .B ENVIRON
472: array of environment variables; subscripts are names.
473: .PD
474: .PP
475: Functions may be defined (at the position of a pattern-action statement) thus:
476: .IP
477: .B
478: function foo(a, b, c) { ...; return x }
479: .PP
480: Parameters are passed by value if scalar and by reference if array name;
481: functions may be called recursively.
482: Parameters are local to the function; all other variables are global.
483: Thus local variables may be created by providing excess parameters in
484: the function definition.
485: .SH EXAMPLES
486: .TP
487: .B
488: length($0) > 72
489: Print lines longer than 72 characters.
490: .TP
491: .B
492: { print $2, $1 }
493: Print first two fields in opposite order.
494: .PP
495: .EX
496: BEGIN { FS = ",[ \et]*|[ \et]+" }
497: { print $2, $1 }
498: .EE
499: .ns
500: .IP
501: Same, with input fields separated by comma and/or blanks and tabs.
502: .PP
503: .EX
504: .nf
505: { s += $1 }
506: END { print "sum is", s, " average is", s/NR }
507: .fi
508: .EE
509: .ns
510: .IP
511: Add up first column, print sum and average.
512: .TP
513: .B
514: /start/, /stop/
515: Print all lines between start/stop pairs.
516: .PP
517: .EX
518: .nf
519: BEGIN { # Simulate echo(1)
520: for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
521: printf "\en"
522: exit }
523: .fi
524: .EE
525: .SH SEE ALSO
526: .IR lex (1),
527: .IR sed (1)
528: .br
529: A. V. Aho, B. W. Kernighan, P. J. Weinberger,
530: .I
531: The AWK Programming Language,
532: Addison-Wesley, 1988. ISBN 0-201-07981-X
533: .SH BUGS
534: There are no explicit conversions between numbers and strings.
535: To force an expression to be treated as a number add 0 to it;
536: to force it to be treated as a string concatenate
537: \&\f(CW""\fP to it.
538: .br
539: The scope rules for variables in functions are a botch;
540: the syntax is worse.