Annotation of src/usr.bin/awk/awk.1, Revision 1.4
1.4 ! millert 1: .\" $OpenBSD: awk.1,v 1.3 1997/01/20 19:43:18 millert Exp $
1.1 tholo 2: .de EX
3: .nf
4: .ft CW
5: ..
6: .de EE
7: .br
8: .fi
9: .ft 1
10: ..
11: .TH AWK 1
12: .CT 1 files prog_other
13: .SH NAME
14: awk \- pattern-directed scanning and processing language
15: .SH SYNOPSIS
1.2 etheisen 16: .B awk|nawk
1.1 tholo 17: [
18: .BI \-F
19: .I fs
20: ]
21: [
22: .BI \-v
23: .I var=value
24: ]
25: [
26: .BI \-mr n
27: ]
28: [
29: .BI \-mf n
30: ]
31: [
32: .I 'prog'
33: |
34: .BI \-f
35: .I progfile
36: ]
37: [
38: .I file ...
39: ]
40: .SH DESCRIPTION
41: .I Awk
42: scans each input
43: .I file
44: for lines that match any of a set of patterns specified literally in
45: .IR prog
46: or in one or more files
47: specified as
48: .B \-f
49: .IR progfile .
50: With each pattern
51: there can be an associated action that will be performed
52: when a line of a
53: .I file
54: matches the pattern.
55: Each line is matched against the
56: pattern portion of every pattern-action statement;
57: the associated action is performed for each matched pattern.
58: The file name
59: .B \-
60: means the standard input.
61: Any
62: .IR file
63: of the form
64: .I var=value
65: is treated as an assignment, not a filename,
66: and is executed at the time it would have been opened if it were a filename.
67: The option
68: .B \-v
69: followed by
70: .I var=value
71: is an assignment to be done before
72: .I prog
73: is executed;
74: any number of
75: .B \-v
76: options may be present.
77: The
78: .B \-F
79: .IR fs
80: option defines the input field separator to be the regular expression
81: .IR fs.
82: .PP
83: An input line is normally made up of fields separated by white space,
84: or by regular expression
85: .BR FS .
86: The fields are denoted
87: .BR $1 ,
88: .BR $2 ,
89: \&..., while
90: .B $0
91: refers to the entire line.
92: If
93: .BR FS
94: is null, the input line is split into one field per character.
95: .PP
96: To compensate for inadequate implementation of storage management,
97: the
98: .B \-mr
99: option can be used to set the maximum size of the input record,
100: and the
101: .B \-mf
102: option to set the maximum number of fields.
103: .PP
104: A pattern-action statement has the form
105: .IP
106: .IB pattern " { " action " }
107: .PP
108: A missing
109: .BI { " action " }
110: means print the line;
111: a missing pattern always matches.
112: Pattern-action statements are separated by newlines or semicolons.
113: .PP
114: An action is a sequence of statements.
115: A statement can be one of the following:
116: .PP
117: .EX
118: .ta \w'\f(CWdelete array[expression]'u
119: .RS
120: .nf
121: .ft CW
122: if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
123: while(\fI expression \fP)\fI statement\fP
124: for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
125: for(\fI var \fPin\fI array \fP)\fI statement\fP
126: do\fI statement \fPwhile(\fI expression \fP)
127: break
128: continue
129: {\fR [\fP\fI statement ... \fP\fR] \fP}
130: \fIexpression\fP #\fR commonly\fP\fI var = expression\fP
131: print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
132: printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
133: return\fR [ \fP\fIexpression \fP\fR]\fP
134: next #\fR skip remaining patterns on this input line\fP
135: nextfile #\fR skip rest of this file, open next, start at top\fP
136: delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
137: delete\fI array\fP #\fR delete all elements of array\fP
138: exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
139: .fi
140: .RE
141: .EE
142: .DT
143: .PP
144: Statements are terminated by
145: semicolons, newlines or right braces.
146: An empty
147: .I expression-list
148: stands for
149: .BR $0 .
150: String constants are quoted \&\f(CW"\ "\fR,
151: with the usual C escapes recognized within.
152: Expressions take on string or numeric values as appropriate,
153: and are built using the operators
154: .B + \- * / % ^
155: (exponentiation), and concatenation (indicated by white space).
156: The operators
157: .B
158: ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
159: are also available in expressions.
160: Variables may be scalars, array elements
161: (denoted
162: .IB x [ i ] )
163: or fields.
164: Variables are initialized to the null string.
165: Array subscripts may be any string,
166: not necessarily numeric;
167: this allows for a form of associative memory.
168: Multiple subscripts such as
169: .B [i,j,k]
170: are permitted; the constituents are concatenated,
171: separated by the value of
172: .BR SUBSEP .
173: .PP
174: The
175: .B print
176: statement prints its arguments on the standard output
177: (or on a file if
178: .BI > file
179: or
180: .BI >> file
181: is present or on a pipe if
182: .BI | cmd
183: is present), separated by the current output field separator,
184: and terminated by the output record separator.
185: .I file
186: and
187: .I cmd
188: may be literal names or parenthesized expressions;
189: identical string values in different statements denote
190: the same open file.
191: The
192: .B printf
193: statement formats its expression list according to the format
194: (see
195: .IR printf (3)) .
196: The built-in function
197: .BI close( expr )
198: closes the file or pipe
199: .IR expr .
200: The built-in function
201: .BI fflush( expr )
202: flushes any buffered output for the file or pipe
203: .IR expr .
204: .PP
205: The mathematical functions
206: .BR exp ,
207: .BR log ,
208: .BR sqrt ,
209: .BR sin ,
210: .BR cos ,
211: and
212: .BR atan2
213: are built in.
214: Other built-in functions:
215: .TF length
216: .TP
217: .B length
218: the length of its argument
219: taken as a string,
220: or of
221: .B $0
222: if no argument.
223: .TP
224: .B rand
225: random number on (0,1)
226: .TP
227: .B srand
228: sets seed for
229: .B rand
230: and returns the previous seed.
231: .TP
232: .B int
233: truncates to an integer value
234: .TP
235: .BI substr( s , " m" , " n\fB)
236: the
237: .IR n -character
238: substring of
239: .I s
240: that begins at position
241: .IR m
242: counted from 1.
243: .TP
244: .BI index( s , " t" )
245: the position in
246: .I s
247: where the string
248: .I t
249: occurs, or 0 if it does not.
250: .TP
251: .BI match( s , " r" )
252: the position in
253: .I s
254: where the regular expression
255: .I r
256: occurs, or 0 if it does not.
257: The variables
258: .B RSTART
259: and
260: .B RLENGTH
261: are set to the position and length of the matched string.
262: .TP
263: .BI split( s , " a" , " fs\fB)
264: splits the string
265: .I s
266: into array elements
267: .IB a [1] ,
268: .IB a [2] ,
269: \&...,
270: .IB a [ n ] ,
271: and returns
272: .IR n .
273: The separation is done with the regular expression
274: .I fs
275: or with the field separator
276: .B FS
277: if
278: .I fs
279: is not given.
280: An empty string as field separator splits the string
281: into one array element per character.
282: .TP
283: .BI sub( r , " t" , " s\fB)
284: substitutes
285: .I t
286: for the first occurrence of the regular expression
287: .I r
288: in the string
289: .IR s .
290: If
291: .I s
292: is not given,
293: .B $0
294: is used.
295: .TP
296: .B gsub
297: same as
298: .B sub
299: except that all occurrences of the regular expression
300: are replaced;
301: .B sub
302: and
303: .B gsub
304: return the number of replacements.
305: .TP
306: .BI sprintf( fmt , " expr" , " ...\fB )
307: the string resulting from formatting
308: .I expr ...
309: according to the
310: .IR printf (3)
311: format
312: .I fmt
313: .TP
314: .BI system( cmd )
315: executes
316: .I cmd
317: and returns its exit status
318: .TP
319: .BI tolower( str )
320: returns a copy of
321: .I str
322: with all upper-case characters translated to their
323: corresponding lower-case equivalents.
324: .TP
325: .BI toupper( str )
326: returns a copy of
327: .I str
328: with all lower-case characters translated to their
329: corresponding upper-case equivalents.
330: .PD
331: .PP
332: The ``function''
333: .B getline
334: sets
335: .B $0
336: to the next input record from the current input file;
337: .B getline
338: .BI < file
339: sets
340: .B $0
341: to the next record from
342: .IR file .
343: .B getline
344: .I x
345: sets variable
346: .I x
347: instead.
348: Finally,
349: .IB cmd " | getline
350: pipes the output of
351: .I cmd
352: into
353: .BR getline ;
354: each call of
355: .B getline
356: returns the next line of output from
357: .IR cmd .
358: In all cases,
359: .B getline
360: returns 1 for a successful input,
361: 0 for end of file, and \-1 for an error.
362: .PP
363: Patterns are arbitrary Boolean combinations
364: (with
365: .BR "! || &&" )
366: of regular expressions and
367: relational expressions.
368: Regular expressions are as in
369: .IR egrep ;
370: see
371: .IR grep (1).
372: Isolated regular expressions
373: in a pattern apply to the entire line.
374: Regular expressions may also occur in
375: relational expressions, using the operators
376: .BR ~
377: and
378: .BR !~ .
379: .BI / re /
380: is a constant regular expression;
381: any string (constant or variable) may be used
382: as a regular expression, except in the position of an isolated regular expression
383: in a pattern.
384: .PP
385: A pattern may consist of two patterns separated by a comma;
386: in this case, the action is performed for all lines
387: from an occurrence of the first pattern
388: though an occurrence of the second.
389: .PP
390: A relational expression is one of the following:
391: .IP
392: .I expression matchop regular-expression
393: .br
394: .I expression relop expression
395: .br
396: .IB expression " in " array-name
397: .br
398: .BI ( expr , expr,... ") in " array-name
399: .PP
400: where a relop is any of the six relational operators in C,
401: and a matchop is either
402: .B ~
403: (matches)
404: or
405: .B !~
406: (does not match).
407: A conditional is an arithmetic expression,
408: a relational expression,
409: or a Boolean combination
410: of these.
411: .PP
412: The special patterns
413: .B BEGIN
414: and
415: .B END
416: may be used to capture control before the first input line is read
417: and after the last.
418: .B BEGIN
419: and
420: .B END
421: do not combine with other patterns.
422: .PP
423: Variable names with special meanings:
424: .TF FILENAME
425: .TP
426: .B CONVFMT
427: conversion format used when converting numbers
1.3 millert 428: (default
1.1 tholo 429: .BR "%.6g" )
430: .TP
431: .B FS
432: regular expression used to separate fields; also settable
433: by option
434: .BI \-F fs.
435: .TP
436: .BR NF
437: number of fields in the current record
438: .TP
439: .B NR
440: ordinal number of the current record
441: .TP
442: .B FNR
443: ordinal number of the current record in the current file
444: .TP
445: .B FILENAME
446: the name of the current input file
447: .TP
448: .B RS
449: input record separator (default newline)
450: .TP
451: .B OFS
452: output field separator (default blank)
453: .TP
454: .B ORS
455: output record separator (default newline)
456: .TP
457: .B OFMT
458: output format for numbers (default
459: .BR "%.6g" )
460: .TP
461: .B SUBSEP
462: separates multiple subscripts (default 034)
463: .TP
464: .B ARGC
465: argument count, assignable
466: .TP
467: .B ARGV
468: argument array, assignable;
469: non-null members are taken as filenames
470: .TP
471: .B ENVIRON
472: array of environment variables; subscripts are names.
473: .PD
474: .PP
475: Functions may be defined (at the position of a pattern-action statement) thus:
476: .IP
477: .B
478: function foo(a, b, c) { ...; return x }
479: .PP
480: Parameters are passed by value if scalar and by reference if array name;
481: functions may be called recursively.
482: Parameters are local to the function; all other variables are global.
483: Thus local variables may be created by providing excess parameters in
484: the function definition.
485: .SH EXAMPLES
486: .TP
1.3 millert 487: .EX
1.1 tholo 488: length($0) > 72
1.3 millert 489: .EE
1.1 tholo 490: Print lines longer than 72 characters.
491: .TP
1.3 millert 492: .EX
1.1 tholo 493: { print $2, $1 }
1.3 millert 494: .EE
1.1 tholo 495: Print first two fields in opposite order.
496: .PP
497: .EX
498: BEGIN { FS = ",[ \et]*|[ \et]+" }
499: { print $2, $1 }
500: .EE
501: .ns
502: .IP
503: Same, with input fields separated by comma and/or blanks and tabs.
504: .PP
505: .EX
506: .nf
507: { s += $1 }
508: END { print "sum is", s, " average is", s/NR }
509: .fi
510: .EE
511: .ns
512: .IP
513: Add up first column, print sum and average.
514: .TP
1.3 millert 515: .EX
1.1 tholo 516: /start/, /stop/
1.3 millert 517: .EE
1.1 tholo 518: Print all lines between start/stop pairs.
519: .PP
520: .EX
521: .nf
522: BEGIN { # Simulate echo(1)
523: for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
524: printf "\en"
525: exit }
526: .fi
527: .EE
528: .SH SEE ALSO
529: .IR lex (1),
530: .IR sed (1)
531: .br
532: A. V. Aho, B. W. Kernighan, P. J. Weinberger,
533: .I
534: The AWK Programming Language,
535: Addison-Wesley, 1988. ISBN 0-201-07981-X
536: .SH BUGS
537: There are no explicit conversions between numbers and strings.
538: To force an expression to be treated as a number add 0 to it;
539: to force it to be treated as a string concatenate
540: \&\f(CW""\fP to it.
541: .br
542: The scope rules for variables in functions are a botch;
543: the syntax is worse.