Annotation of src/usr.bin/lex/flex.1, Revision 1.14
1.14 ! tedu 1: .\" $OpenBSD: flex.1,v 1.13 2003/06/04 17:34:44 millert Exp $
1.12 jmc 2: .\"
3: .\" Copyright (c) 1990 The Regents of the University of California.
4: .\" All rights reserved.
1.2 deraadt 5: .\"
1.12 jmc 6: .\" This code is derived from software contributed to Berkeley by
7: .\" Vern Paxson.
8: .\"
9: .\" The United States Government has rights in this work pursuant
10: .\" to contract no. DE-AC03-76SF00098 between the United States
11: .\" Department of Energy and the University of California.
12: .\"
13: .\" Redistribution and use in source and binary forms, with or without
1.13 millert 14: .\" modification, are permitted provided that the following conditions
15: .\" are met:
16: .\"
17: .\" 1. Redistributions of source code must retain the above copyright
18: .\" notice, this list of conditions and the following disclaimer.
19: .\" 2. Redistributions in binary form must reproduce the above copyright
20: .\" notice, this list of conditions and the following disclaimer in the
21: .\" documentation and/or other materials provided with the distribution.
22: .\"
23: .\" Neither the name of the University nor the names of its contributors
24: .\" may be used to endorse or promote products derived from this software
25: .\" without specific prior written permission.
26: .\"
27: .\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
28: .\" IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
29: .\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
30: .\" PURPOSE.
1.12 jmc 31: .\"
1.1 deraadt 32: .TH FLEX 1 "April 1995" "Version 2.5"
33: .SH NAME
34: flex \- fast lexical analyzer generator
35: .SH SYNOPSIS
36: .B flex
37: .B [\-bcdfhilnpstvwBFILTV78+? \-C[aefFmr] \-ooutput \-Pprefix \-Sskeleton]
38: .B [\-\-help \-\-version]
39: .I [filename ...]
40: .SH OVERVIEW
41: This manual describes
42: .I flex,
43: a tool for generating programs that perform pattern-matching on text. The
44: manual includes both tutorial and reference sections:
45: .nf
46:
47: Description
48: a brief overview of the tool
49:
50: Some Simple Examples
51:
52: Format Of The Input File
53:
54: Patterns
55: the extended regular expressions used by flex
56:
57: How The Input Is Matched
58: the rules for determining what has been matched
59:
60: Actions
61: how to specify what to do when a pattern is matched
62:
63: The Generated Scanner
64: details regarding the scanner that flex produces;
65: how to control the input source
66:
67: Start Conditions
68: introducing context into your scanners, and
69: managing "mini-scanners"
70:
71: Multiple Input Buffers
72: how to manipulate multiple input sources; how to
73: scan from strings instead of files
74:
75: End-of-file Rules
76: special rules for matching the end of the input
77:
78: Miscellaneous Macros
79: a summary of macros available to the actions
80:
81: Values Available To The User
82: a summary of values available to the actions
83:
84: Interfacing With Yacc
85: connecting flex scanners together with yacc parsers
86:
87: Options
88: flex command-line options, and the "%option"
89: directive
90:
91: Performance Considerations
92: how to make your scanner go as fast as possible
93:
94: Generating C++ Scanners
95: the (experimental) facility for generating C++
96: scanner classes
97:
98: Incompatibilities With Lex And POSIX
99: how flex differs from AT&T lex and the POSIX lex
100: standard
101:
102: Diagnostics
103: those error messages produced by flex (or scanners
104: it generates) whose meanings might not be apparent
105:
106: Files
107: files used by flex
108:
109: Deficiencies / Bugs
110: known problems with flex
111:
112: See Also
113: other documentation, related tools
114:
115: Author
116: includes contact information
117:
118: .fi
119: .SH DESCRIPTION
120: .I flex
121: is a tool for generating
122: .I scanners:
1.9 millert 123: programs which recognize lexical patterns in text.
1.1 deraadt 124: .I flex
125: reads
126: the given input files, or its standard input if no file names are given,
127: for a description of a scanner to generate. The description is in
128: the form of pairs
129: of regular expressions and C code, called
130: .I rules. flex
131: generates as output a C source file,
132: .B lex.yy.c,
133: which defines a routine
134: .B yylex().
135: This file is compiled and linked with the
136: .B \-lfl
137: library to produce an executable. When the executable is run,
138: it analyzes its input for occurrences
139: of the regular expressions. Whenever it finds one, it executes
140: the corresponding C code.
141: .SH SOME SIMPLE EXAMPLES
142: .PP
143: First some simple examples to get the flavor of how one uses
144: .I flex.
145: The following
146: .I flex
147: input specifies a scanner which whenever it encounters the string
148: "username" will replace it with the user's login name:
149: .nf
150:
151: %%
152: username printf( "%s", getlogin() );
153:
154: .fi
155: By default, any text not matched by a
156: .I flex
157: scanner
158: is copied to the output, so the net effect of this scanner is
159: to copy its input file to its output with each occurrence
160: of "username" expanded.
161: In this input, there is just one rule. "username" is the
162: .I pattern
163: and the "printf" is the
164: .I action.
165: The "%%" marks the beginning of the rules.
166: .PP
167: Here's another simple example:
168: .nf
169:
170: int num_lines = 0, num_chars = 0;
171:
172: %%
173: \\n ++num_lines; ++num_chars;
174: . ++num_chars;
175:
176: %%
177: main()
178: {
179: yylex();
180: printf( "# of lines = %d, # of chars = %d\\n",
181: num_lines, num_chars );
182: }
183:
184: .fi
185: This scanner counts the number of characters and the number
186: of lines in its input (it produces no output other than the
187: final report on the counts). The first line
188: declares two globals, "num_lines" and "num_chars", which are accessible
189: both inside
190: .B yylex()
191: and in the
192: .B main()
193: routine declared after the second "%%". There are two rules, one
194: which matches a newline ("\\n") and increments both the line count and
195: the character count, and one which matches any character other than
196: a newline (indicated by the "." regular expression).
197: .PP
198: A somewhat more complicated example:
199: .nf
200:
201: /* scanner for a toy Pascal-like language */
202:
203: %{
204: /* need this for the call to atof() below */
205: #include <math.h>
206: %}
207:
208: DIGIT [0-9]
209: ID [a-z][a-z0-9]*
210:
211: %%
212:
213: {DIGIT}+ {
214: printf( "An integer: %s (%d)\\n", yytext,
215: atoi( yytext ) );
216: }
217:
218: {DIGIT}+"."{DIGIT}* {
219: printf( "A float: %s (%g)\\n", yytext,
220: atof( yytext ) );
221: }
222:
223: if|then|begin|end|procedure|function {
224: printf( "A keyword: %s\\n", yytext );
225: }
226:
227: {ID} printf( "An identifier: %s\\n", yytext );
228:
229: "+"|"-"|"*"|"/" printf( "An operator: %s\\n", yytext );
230:
231: "{"[^}\\n]*"}" /* eat up one-line comments */
232:
233: [ \\t\\n]+ /* eat up whitespace */
234:
235: . printf( "Unrecognized character: %s\\n", yytext );
236:
237: %%
238:
239: main( argc, argv )
240: int argc;
241: char **argv;
242: {
243: ++argv, --argc; /* skip over program name */
244: if ( argc > 0 )
245: yyin = fopen( argv[0], "r" );
246: else
247: yyin = stdin;
1.7 aaron 248:
1.1 deraadt 249: yylex();
250: }
251:
252: .fi
253: This is the beginnings of a simple scanner for a language like
254: Pascal. It identifies different types of
255: .I tokens
256: and reports on what it has seen.
257: .PP
258: The details of this example will be explained in the following
259: sections.
260: .SH FORMAT OF THE INPUT FILE
261: The
262: .I flex
263: input file consists of three sections, separated by a line with just
264: .B %%
265: in it:
266: .nf
267:
268: definitions
269: %%
270: rules
271: %%
272: user code
273:
274: .fi
275: The
276: .I definitions
277: section contains declarations of simple
278: .I name
279: definitions to simplify the scanner specification, and declarations of
280: .I start conditions,
281: which are explained in a later section.
282: .PP
283: Name definitions have the form:
284: .nf
285:
286: name definition
287:
288: .fi
289: The "name" is a word beginning with a letter or an underscore ('_')
290: followed by zero or more letters, digits, '_', or '-' (dash).
1.8 aaron 291: The definition is taken to begin at the first non-whitespace character
1.1 deraadt 292: following the name and continuing to the end of the line.
293: The definition can subsequently be referred to using "{name}", which
294: will expand to "(definition)". For example,
295: .nf
296:
297: DIGIT [0-9]
298: ID [a-z][a-z0-9]*
299:
300: .fi
301: defines "DIGIT" to be a regular expression which matches a
302: single digit, and
303: "ID" to be a regular expression which matches a letter
304: followed by zero-or-more letters-or-digits.
305: A subsequent reference to
306: .nf
307:
308: {DIGIT}+"."{DIGIT}*
309:
310: .fi
311: is identical to
312: .nf
313:
314: ([0-9])+"."([0-9])*
315:
316: .fi
317: and matches one-or-more digits followed by a '.' followed
318: by zero-or-more digits.
319: .PP
320: The
321: .I rules
322: section of the
323: .I flex
324: input contains a series of rules of the form:
325: .nf
326:
327: pattern action
328:
329: .fi
330: where the pattern must be unindented and the action must begin
331: on the same line.
332: .PP
333: See below for a further description of patterns and actions.
334: .PP
335: Finally, the user code section is simply copied to
336: .B lex.yy.c
337: verbatim.
338: It is used for companion routines which call or are called
339: by the scanner. The presence of this section is optional;
340: if it is missing, the second
341: .B %%
342: in the input file may be skipped, too.
343: .PP
344: In the definitions and rules sections, any
345: .I indented
346: text or text enclosed in
347: .B %{
348: and
349: .B %}
350: is copied verbatim to the output (with the %{}'s removed).
351: The %{}'s must appear unindented on lines by themselves.
352: .PP
353: In the rules section,
354: any indented or %{} text appearing before the
355: first rule may be used to declare variables
356: which are local to the scanning routine and (after the declarations)
357: code which is to be executed whenever the scanning routine is entered.
358: Other indented or %{} text in the rule section is still copied to the output,
359: but its meaning is not well-defined and it may well cause compile-time
360: errors (this feature is present for
361: .I POSIX
362: compliance; see below for other such features).
363: .PP
364: In the definitions section (but not in the rules section),
365: an unindented comment (i.e., a line
366: beginning with "/*") is also copied verbatim to the output up
367: to the next "*/".
368: .SH PATTERNS
369: The patterns in the input are written using an extended set of regular
370: expressions. These are:
371: .nf
372:
373: x match the character 'x'
374: . any character (byte) except newline
375: [xyz] a "character class"; in this case, the pattern
376: matches either an 'x', a 'y', or a 'z'
377: [abj-oZ] a "character class" with a range in it; matches
378: an 'a', a 'b', any letter from 'j' through 'o',
379: or a 'Z'
380: [^A-Z] a "negated character class", i.e., any character
381: but those in the class. In this case, any
382: character EXCEPT an uppercase letter.
383: [^A-Z\\n] any character EXCEPT an uppercase letter or
384: a newline
385: r* zero or more r's, where r is any regular expression
386: r+ one or more r's
387: r? zero or one r's (that is, "an optional r")
388: r{2,5} anywhere from two to five r's
389: r{2,} two or more r's
390: r{4} exactly 4 r's
391: {name} the expansion of the "name" definition
392: (see above)
393: "[xyz]\\"foo"
394: the literal string: [xyz]"foo
395: \\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
396: then the ANSI-C interpretation of \\x.
397: Otherwise, a literal 'X' (used to escape
398: operators such as '*')
399: \\0 a NUL character (ASCII code 0)
400: \\123 the character with octal value 123
401: \\x2a the character with hexadecimal value 2a
402: (r) match an r; parentheses are used to override
403: precedence (see below)
404:
405:
406: rs the regular expression r followed by the
407: regular expression s; called "concatenation"
408:
409:
410: r|s either an r or an s
411:
412:
413: r/s an r but only if it is followed by an s. The
414: text matched by s is included when determining
415: whether this rule is the "longest match",
416: but is then returned to the input before
417: the action is executed. So the action only
418: sees the text matched by r. This type
419: of pattern is called trailing context".
420: (There are some combinations of r/s that flex
421: cannot match correctly; see notes in the
422: Deficiencies / Bugs section below regarding
423: "dangerous trailing context".)
424: ^r an r, but only at the beginning of a line (i.e.,
1.10 deraadt 425: just starting to scan, or right after a
1.1 deraadt 426: newline has been scanned).
427: r$ an r, but only at the end of a line (i.e., just
428: before a newline). Equivalent to "r/\\n".
429:
430: Note that flex's notion of "newline" is exactly
431: whatever the C compiler used to compile flex
432: interprets '\\n' as; in particular, on some DOS
433: systems you must either filter out \\r's in the
434: input yourself, or explicitly use r/\\r\\n for "r$".
435:
436:
437: <s>r an r, but only in start condition s (see
438: below for discussion of start conditions)
439: <s1,s2,s3>r
440: same, but in any of start conditions s1,
441: s2, or s3
442: <*>r an r in any start condition, even an exclusive one.
443:
444:
445: <<EOF>> an end-of-file
446: <s1,s2><<EOF>>
447: an end-of-file when in start condition s1 or s2
448:
449: .fi
450: Note that inside of a character class, all regular expression operators
451: lose their special meaning except escape ('\\') and the character class
452: operators, '-', ']', and, at the beginning of the class, '^'.
453: .PP
454: The regular expressions listed above are grouped according to
455: precedence, from highest precedence at the top to lowest at the bottom.
456: Those grouped together have equal precedence. For example,
457: .nf
458:
459: foo|bar*
460:
461: .fi
462: is the same as
463: .nf
464:
465: (foo)|(ba(r*))
466:
467: .fi
468: since the '*' operator has higher precedence than concatenation,
469: and concatenation higher than alternation ('|'). This pattern
470: therefore matches
471: .I either
472: the string "foo"
473: .I or
474: the string "ba" followed by zero-or-more r's.
475: To match "foo" or zero-or-more "bar"'s, use:
476: .nf
477:
478: foo|(bar)*
479:
480: .fi
481: and to match zero-or-more "foo"'s-or-"bar"'s:
482: .nf
483:
484: (foo|bar)*
485:
486: .fi
487: .PP
488: In addition to characters and ranges of characters, character classes
489: can also contain character class
490: .I expressions.
491: These are expressions enclosed inside
492: .B [:
493: and
494: .B :]
495: delimiters (which themselves must appear between the '[' and ']' of the
496: character class; other elements may occur inside the character class, too).
497: The valid expressions are:
498: .nf
499:
500: [:alnum:] [:alpha:] [:blank:]
501: [:cntrl:] [:digit:] [:graph:]
502: [:lower:] [:print:] [:punct:]
503: [:space:] [:upper:] [:xdigit:]
504:
505: .fi
506: These expressions all designate a set of characters equivalent to
507: the corresponding standard C
508: .B isXXX
509: function. For example,
510: .B [:alnum:]
511: designates those characters for which
512: .B isalnum()
513: returns true - i.e., any alphabetic or numeric.
514: Some systems don't provide
515: .B isblank(),
516: so flex defines
517: .B [:blank:]
518: as a blank or a tab.
519: .PP
520: For example, the following character classes are all equivalent:
521: .nf
522:
523: [[:alnum:]]
1.4 deraadt 524: [[:alpha:][:digit:]]
1.1 deraadt 525: [[:alpha:]0-9]
526: [a-zA-Z0-9]
527:
528: .fi
529: If your scanner is case-insensitive (the
530: .B \-i
531: flag), then
532: .B [:upper:]
533: and
534: .B [:lower:]
535: are equivalent to
536: .B [:alpha:].
537: .PP
538: Some notes on patterns:
539: .IP -
540: A negated character class such as the example "[^A-Z]"
541: above
542: .I will match a newline
543: unless "\\n" (or an equivalent escape sequence) is one of the
544: characters explicitly present in the negated character class
545: (e.g., "[^A-Z\\n]"). This is unlike how many other regular
546: expression tools treat negated character classes, but unfortunately
547: the inconsistency is historically entrenched.
548: Matching newlines means that a pattern like [^"]* can match the entire
549: input unless there's another quote in the input.
550: .IP -
551: A rule can have at most one instance of trailing context (the '/' operator
552: or the '$' operator). The start condition, '^', and "<<EOF>>" patterns
553: can only occur at the beginning of a pattern, and, as well as with '/' and '$',
554: cannot be grouped inside parentheses. A '^' which does not occur at
555: the beginning of a rule or a '$' which does not occur at the end of
556: a rule loses its special properties and is treated as a normal character.
557: .IP
558: The following are illegal:
559: .nf
560:
561: foo/bar$
562: <sc1>foo<sc2>bar
563:
564: .fi
565: Note that the first of these, can be written "foo/bar\\n".
566: .IP
567: The following will result in '$' or '^' being treated as a normal character:
568: .nf
569:
570: foo|(bar$)
571: foo|^bar
572:
573: .fi
574: If what's wanted is a "foo" or a bar-followed-by-a-newline, the following
575: could be used (the special '|' action is explained below):
576: .nf
577:
578: foo |
579: bar$ /* action goes here */
580:
581: .fi
582: A similar trick will work for matching a foo or a
583: bar-at-the-beginning-of-a-line.
584: .SH HOW THE INPUT IS MATCHED
585: When the generated scanner is run, it analyzes its input looking
586: for strings which match any of its patterns. If it finds more than
587: one match, it takes the one matching the most text (for trailing
588: context rules, this includes the length of the trailing part, even
589: though it will then be returned to the input). If it finds two
590: or more matches of the same length, the
591: rule listed first in the
592: .I flex
593: input file is chosen.
594: .PP
595: Once the match is determined, the text corresponding to the match
596: (called the
597: .I token)
598: is made available in the global character pointer
599: .B yytext,
600: and its length in the global integer
601: .B yyleng.
602: The
603: .I action
604: corresponding to the matched pattern is then executed (a more
605: detailed description of actions follows), and then the remaining
606: input is scanned for another match.
607: .PP
608: If no match is found, then the
609: .I default rule
610: is executed: the next character in the input is considered matched and
611: copied to the standard output. Thus, the simplest legal
612: .I flex
613: input is:
614: .nf
615:
616: %%
617:
618: .fi
619: which generates a scanner that simply copies its input (one character
620: at a time) to its output.
621: .PP
622: Note that
623: .B yytext
624: can be defined in two different ways: either as a character
625: .I pointer
626: or as a character
627: .I array.
628: You can control which definition
629: .I flex
630: uses by including one of the special directives
631: .B %pointer
632: or
633: .B %array
634: in the first (definitions) section of your flex input. The default is
635: .B %pointer,
636: unless you use the
637: .B -l
638: lex compatibility option, in which case
639: .B yytext
640: will be an array.
641: The advantage of using
642: .B %pointer
643: is substantially faster scanning and no buffer overflow when matching
644: very large tokens (unless you run out of dynamic memory). The disadvantage
645: is that you are restricted in how your actions can modify
646: .B yytext
647: (see the next section), and calls to the
648: .B unput()
1.10 deraadt 649: function destroy the present contents of
1.1 deraadt 650: .B yytext,
651: which can be a considerable porting headache when moving between different
652: .I lex
653: versions.
654: .PP
655: The advantage of
656: .B %array
657: is that you can then modify
658: .B yytext
659: to your heart's content, and calls to
660: .B unput()
661: do not destroy
662: .B yytext
663: (see below). Furthermore, existing
664: .I lex
665: programs sometimes access
666: .B yytext
667: externally using declarations of the form:
668: .nf
669: extern char yytext[];
670: .fi
671: This definition is erroneous when used with
672: .B %pointer,
673: but correct for
674: .B %array.
675: .PP
676: .B %array
677: defines
678: .B yytext
679: to be an array of
680: .B YYLMAX
681: characters, which defaults to a fairly large value. You can change
682: the size by simply #define'ing
683: .B YYLMAX
684: to a different value in the first section of your
685: .I flex
686: input. As mentioned above, with
687: .B %pointer
688: yytext grows dynamically to accommodate large tokens. While this means your
689: .B %pointer
690: scanner can accommodate very large tokens (such as matching entire blocks
691: of comments), bear in mind that each time the scanner must resize
692: .B yytext
693: it also must rescan the entire token from the beginning, so matching such
694: tokens can prove slow.
695: .B yytext
696: presently does
697: .I not
698: dynamically grow if a call to
699: .B unput()
700: results in too much text being pushed back; instead, a run-time error results.
701: .PP
702: Also note that you cannot use
703: .B %array
704: with C++ scanner classes
705: (the
706: .B c++
707: option; see below).
708: .SH ACTIONS
709: Each pattern in a rule has a corresponding action, which can be any
710: arbitrary C statement. The pattern ends at the first non-escaped
711: whitespace character; the remainder of the line is its action. If the
712: action is empty, then when the pattern is matched the input token
713: is simply discarded. For example, here is the specification for a program
714: which deletes all occurrences of "zap me" from its input:
715: .nf
716:
717: %%
718: "zap me"
719:
720: .fi
721: (It will copy all other characters in the input to the output since
722: they will be matched by the default rule.)
723: .PP
724: Here is a program which compresses multiple blanks and tabs down to
725: a single blank, and throws away whitespace found at the end of a line:
726: .nf
727:
728: %%
729: [ \\t]+ putchar( ' ' );
730: [ \\t]+$ /* ignore this token */
731:
732: .fi
733: .PP
734: If the action contains a '{', then the action spans till the balancing '}'
735: is found, and the action may cross multiple lines.
1.7 aaron 736: .I flex
1.1 deraadt 737: knows about C strings and comments and won't be fooled by braces found
738: within them, but also allows actions to begin with
739: .B %{
740: and will consider the action to be all the text up to the next
741: .B %}
742: (regardless of ordinary braces inside the action).
743: .PP
744: An action consisting solely of a vertical bar ('|') means "same as
745: the action for the next rule." See below for an illustration.
746: .PP
747: Actions can include arbitrary C code, including
748: .B return
749: statements to return a value to whatever routine called
750: .B yylex().
751: Each time
752: .B yylex()
753: is called it continues processing tokens from where it last left
754: off until it either reaches
755: the end of the file or executes a return.
756: .PP
757: Actions are free to modify
758: .B yytext
759: except for lengthening it (adding
760: characters to its end--these will overwrite later characters in the
761: input stream). This however does not apply when using
762: .B %array
763: (see above); in that case,
764: .B yytext
765: may be freely modified in any way.
766: .PP
767: Actions are free to modify
768: .B yyleng
769: except they should not do so if the action also includes use of
770: .B yymore()
771: (see below).
772: .PP
773: There are a number of special directives which can be included within
774: an action:
775: .IP -
776: .B ECHO
777: copies yytext to the scanner's output.
778: .IP -
779: .B BEGIN
780: followed by the name of a start condition places the scanner in the
781: corresponding start condition (see below).
782: .IP -
783: .B REJECT
784: directs the scanner to proceed on to the "second best" rule which matched the
785: input (or a prefix of the input). The rule is chosen as described
786: above in "How the Input is Matched", and
787: .B yytext
788: and
789: .B yyleng
790: set up appropriately.
791: It may either be one which matched as much text
792: as the originally chosen rule but came later in the
793: .I flex
794: input file, or one which matched less text.
795: For example, the following will both count the
796: words in the input and call the routine special() whenever "frob" is seen:
797: .nf
798:
799: int word_count = 0;
800: %%
801:
802: frob special(); REJECT;
803: [^ \\t\\n]+ ++word_count;
804:
805: .fi
806: Without the
807: .B REJECT,
808: any "frob"'s in the input would not be counted as words, since the
809: scanner normally executes only one action per token.
810: Multiple
811: .B REJECT's
812: are allowed, each one finding the next best choice to the currently
813: active rule. For example, when the following scanner scans the token
814: "abcd", it will write "abcdabcaba" to the output:
815: .nf
816:
817: %%
818: a |
819: ab |
820: abc |
821: abcd ECHO; REJECT;
822: .|\\n /* eat up any unmatched character */
823:
824: .fi
825: (The first three rules share the fourth's action since they use
826: the special '|' action.)
827: .B REJECT
828: is a particularly expensive feature in terms of scanner performance;
829: if it is used in
830: .I any
831: of the scanner's actions it will slow down
832: .I all
833: of the scanner's matching. Furthermore,
834: .B REJECT
835: cannot be used with the
836: .I -Cf
837: or
838: .I -CF
839: options (see below).
840: .IP
841: Note also that unlike the other special actions,
842: .B REJECT
843: is a
844: .I branch;
845: code immediately following it in the action will
846: .I not
847: be executed.
848: .IP -
849: .B yymore()
850: tells the scanner that the next time it matches a rule, the corresponding
851: token should be
852: .I appended
853: onto the current value of
854: .B yytext
855: rather than replacing it. For example, given the input "mega-kludge"
856: the following will write "mega-mega-kludge" to the output:
857: .nf
858:
859: %%
860: mega- ECHO; yymore();
861: kludge ECHO;
862:
863: .fi
864: First "mega-" is matched and echoed to the output. Then "kludge"
865: is matched, but the previous "mega-" is still hanging around at the
866: beginning of
867: .B yytext
868: so the
869: .B ECHO
870: for the "kludge" rule will actually write "mega-kludge".
871: .PP
872: Two notes regarding use of
873: .B yymore().
874: First,
875: .B yymore()
876: depends on the value of
877: .I yyleng
878: correctly reflecting the size of the current token, so you must not
879: modify
880: .I yyleng
881: if you are using
882: .B yymore().
883: Second, the presence of
884: .B yymore()
885: in the scanner's action entails a minor performance penalty in the
886: scanner's matching speed.
887: .IP -
888: .B yyless(n)
889: returns all but the first
890: .I n
891: characters of the current token back to the input stream, where they
892: will be rescanned when the scanner looks for the next match.
893: .B yytext
894: and
895: .B yyleng
896: are adjusted appropriately (e.g.,
897: .B yyleng
898: will now be equal to
899: .I n
900: ). For example, on the input "foobar" the following will write out
901: "foobarbar":
902: .nf
903:
904: %%
905: foobar ECHO; yyless(3);
906: [a-z]+ ECHO;
907:
908: .fi
909: An argument of 0 to
910: .B yyless
911: will cause the entire current input string to be scanned again. Unless you've
912: changed how the scanner will subsequently process its input (using
913: .B BEGIN,
914: for example), this will result in an endless loop.
915: .PP
916: Note that
917: .B yyless
918: is a macro and can only be used in the flex input file, not from
919: other source files.
920: .IP -
921: .B unput(c)
922: puts the character
923: .I c
924: back onto the input stream. It will be the next character scanned.
925: The following action will take the current token and cause it
926: to be rescanned enclosed in parentheses.
927: .nf
928:
929: {
930: int i;
1.14 ! tedu 931: char *yycopy;
! 932:
1.1 deraadt 933: /* Copy yytext because unput() trashes yytext */
1.14 ! tedu 934: if ((yycopy = strdup( yytext )) == NULL);
! 935: err(1, NULL);
1.1 deraadt 936: unput( ')' );
937: for ( i = yyleng - 1; i >= 0; --i )
938: unput( yycopy[i] );
939: unput( '(' );
940: free( yycopy );
941: }
942:
943: .fi
944: Note that since each
945: .B unput()
946: puts the given character back at the
947: .I beginning
948: of the input stream, pushing back strings must be done back-to-front.
949: .PP
950: An important potential problem when using
951: .B unput()
952: is that if you are using
953: .B %pointer
954: (the default), a call to
955: .B unput()
956: .I destroys
957: the contents of
958: .I yytext,
959: starting with its rightmost character and devouring one character to
960: the left with each call. If you need the value of yytext preserved
961: after a call to
962: .B unput()
963: (as in the above example),
964: you must either first copy it elsewhere, or build your scanner using
965: .B %array
966: instead (see How The Input Is Matched).
967: .PP
968: Finally, note that you cannot put back
969: .B EOF
970: to attempt to mark the input stream with an end-of-file.
971: .IP -
972: .B input()
973: reads the next character from the input stream. For example,
974: the following is one way to eat up C comments:
975: .nf
976:
977: %%
978: "/*" {
979: register int c;
980:
981: for ( ; ; )
982: {
983: while ( (c = input()) != '*' &&
984: c != EOF )
985: ; /* eat up text of comment */
986:
987: if ( c == '*' )
988: {
989: while ( (c = input()) == '*' )
990: ;
991: if ( c == '/' )
992: break; /* found the end */
993: }
994:
995: if ( c == EOF )
996: {
997: error( "EOF in comment" );
998: break;
999: }
1000: }
1001: }
1002:
1003: .fi
1004: (Note that if the scanner is compiled using
1005: .B C++,
1006: then
1007: .B input()
1008: is instead referred to as
1009: .B yyinput(),
1010: in order to avoid a name clash with the
1011: .B C++
1012: stream by the name of
1013: .I input.)
1014: .IP -
1015: .B YY_FLUSH_BUFFER
1016: flushes the scanner's internal buffer
1017: so that the next time the scanner attempts to match a token, it will
1018: first refill the buffer using
1019: .B YY_INPUT
1020: (see The Generated Scanner, below). This action is a special case
1021: of the more general
1022: .B yy_flush_buffer()
1023: function, described below in the section Multiple Input Buffers.
1024: .IP -
1025: .B yyterminate()
1026: can be used in lieu of a return statement in an action. It terminates
1027: the scanner and returns a 0 to the scanner's caller, indicating "all done".
1028: By default,
1029: .B yyterminate()
1030: is also called when an end-of-file is encountered. It is a macro and
1031: may be redefined.
1032: .SH THE GENERATED SCANNER
1033: The output of
1034: .I flex
1035: is the file
1036: .B lex.yy.c,
1037: which contains the scanning routine
1038: .B yylex(),
1039: a number of tables used by it for matching tokens, and a number
1040: of auxiliary routines and macros. By default,
1041: .B yylex()
1042: is declared as follows:
1043: .nf
1044:
1045: int yylex()
1046: {
1047: ... various definitions and the actions in here ...
1048: }
1049:
1050: .fi
1051: (If your environment supports function prototypes, then it will
1052: be "int yylex( void )".) This definition may be changed by defining
1053: the "YY_DECL" macro. For example, you could use:
1054: .nf
1055:
1056: #define YY_DECL float lexscan( a, b ) float a, b;
1057:
1058: .fi
1059: to give the scanning routine the name
1060: .I lexscan,
1061: returning a float, and taking two floats as arguments. Note that
1062: if you give arguments to the scanning routine using a
1063: K&R-style/non-prototyped function declaration, you must terminate
1064: the definition with a semi-colon (;).
1065: .PP
1066: Whenever
1067: .B yylex()
1068: is called, it scans tokens from the global input file
1069: .I yyin
1070: (which defaults to stdin). It continues until it either reaches
1071: an end-of-file (at which point it returns the value 0) or
1072: one of its actions executes a
1073: .I return
1074: statement.
1075: .PP
1076: If the scanner reaches an end-of-file, subsequent calls are undefined
1077: unless either
1078: .I yyin
1079: is pointed at a new input file (in which case scanning continues from
1080: that file), or
1081: .B yyrestart()
1082: is called.
1083: .B yyrestart()
1084: takes one argument, a
1085: .B FILE *
1086: pointer (which can be nil, if you've set up
1087: .B YY_INPUT
1088: to scan from a source other than
1089: .I yyin),
1090: and initializes
1091: .I yyin
1092: for scanning from that file. Essentially there is no difference between
1093: just assigning
1094: .I yyin
1095: to a new input file or using
1096: .B yyrestart()
1097: to do so; the latter is available for compatibility with previous versions
1098: of
1099: .I flex,
1100: and because it can be used to switch input files in the middle of scanning.
1101: It can also be used to throw away the current input buffer, by calling
1102: it with an argument of
1103: .I yyin;
1104: but better is to use
1105: .B YY_FLUSH_BUFFER
1106: (see above).
1107: Note that
1108: .B yyrestart()
1109: does
1110: .I not
1111: reset the start condition to
1112: .B INITIAL
1113: (see Start Conditions, below).
1114: .PP
1115: If
1116: .B yylex()
1117: stops scanning due to executing a
1118: .I return
1119: statement in one of the actions, the scanner may then be called again and it
1120: will resume scanning where it left off.
1121: .PP
1122: By default (and for purposes of efficiency), the scanner uses
1123: block-reads rather than simple
1124: .I getc()
1125: calls to read characters from
1126: .I yyin.
1127: The nature of how it gets its input can be controlled by defining the
1128: .B YY_INPUT
1129: macro.
1130: YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its
1131: action is to place up to
1132: .I max_size
1133: characters in the character array
1134: .I buf
1135: and return in the integer variable
1136: .I result
1137: either the
1138: number of characters read or the constant YY_NULL (0 on Unix systems)
1139: to indicate EOF. The default YY_INPUT reads from the
1140: global file-pointer "yyin".
1141: .PP
1142: A sample definition of YY_INPUT (in the definitions
1143: section of the input file):
1144: .nf
1145:
1146: %{
1147: #define YY_INPUT(buf,result,max_size) \\
1148: { \\
1149: int c = getchar(); \\
1150: result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\
1151: }
1152: %}
1153:
1154: .fi
1155: This definition will change the input processing to occur
1156: one character at a time.
1157: .PP
1158: When the scanner receives an end-of-file indication from YY_INPUT,
1159: it then checks the
1160: .B yywrap()
1161: function. If
1162: .B yywrap()
1163: returns false (zero), then it is assumed that the
1164: function has gone ahead and set up
1165: .I yyin
1166: to point to another input file, and scanning continues. If it returns
1167: true (non-zero), then the scanner terminates, returning 0 to its
1168: caller. Note that in either case, the start condition remains unchanged;
1169: it does
1170: .I not
1171: revert to
1172: .B INITIAL.
1173: .PP
1174: If you do not supply your own version of
1175: .B yywrap(),
1176: then you must either use
1177: .B %option noyywrap
1178: (in which case the scanner behaves as though
1179: .B yywrap()
1180: returned 1), or you must link with
1181: .B \-lfl
1182: to obtain the default version of the routine, which always returns 1.
1183: .PP
1184: Three routines are available for scanning from in-memory buffers rather
1185: than files:
1186: .B yy_scan_string(), yy_scan_bytes(),
1187: and
1188: .B yy_scan_buffer().
1189: See the discussion of them below in the section Multiple Input Buffers.
1190: .PP
1191: The scanner writes its
1192: .B ECHO
1193: output to the
1194: .I yyout
1195: global (default, stdout), which may be redefined by the user simply
1196: by assigning it to some other
1197: .B FILE
1198: pointer.
1199: .SH START CONDITIONS
1200: .I flex
1201: provides a mechanism for conditionally activating rules. Any rule
1202: whose pattern is prefixed with "<sc>" will only be active when
1203: the scanner is in the start condition named "sc". For example,
1204: .nf
1205:
1206: <STRING>[^"]* { /* eat up the string body ... */
1207: ...
1208: }
1209:
1210: .fi
1211: will be active only when the scanner is in the "STRING" start
1212: condition, and
1213: .nf
1214:
1215: <INITIAL,STRING,QUOTE>\\. { /* handle an escape ... */
1216: ...
1217: }
1218:
1219: .fi
1220: will be active only when the current start condition is
1221: either "INITIAL", "STRING", or "QUOTE".
1222: .PP
1223: Start conditions
1224: are declared in the definitions (first) section of the input
1225: using unindented lines beginning with either
1226: .B %s
1227: or
1228: .B %x
1229: followed by a list of names.
1230: The former declares
1231: .I inclusive
1232: start conditions, the latter
1233: .I exclusive
1234: start conditions. A start condition is activated using the
1235: .B BEGIN
1236: action. Until the next
1237: .B BEGIN
1238: action is executed, rules with the given start
1239: condition will be active and
1240: rules with other start conditions will be inactive.
1241: If the start condition is
1242: .I inclusive,
1243: then rules with no start conditions at all will also be active.
1244: If it is
1245: .I exclusive,
1246: then
1247: .I only
1248: rules qualified with the start condition will be active.
1249: A set of rules contingent on the same exclusive start condition
1250: describe a scanner which is independent of any of the other rules in the
1251: .I flex
1252: input. Because of this,
1253: exclusive start conditions make it easy to specify "mini-scanners"
1254: which scan portions of the input that are syntactically different
1255: from the rest (e.g., comments).
1256: .PP
1257: If the distinction between inclusive and exclusive start conditions
1258: is still a little vague, here's a simple example illustrating the
1259: connection between the two. The set of rules:
1260: .nf
1261:
1262: %s example
1263: %%
1264:
1265: <example>foo do_something();
1266:
1267: bar something_else();
1268:
1269: .fi
1270: is equivalent to
1271: .nf
1272:
1273: %x example
1274: %%
1275:
1276: <example>foo do_something();
1277:
1278: <INITIAL,example>bar something_else();
1279:
1280: .fi
1281: Without the
1282: .B <INITIAL,example>
1283: qualifier, the
1284: .I bar
1285: pattern in the second example wouldn't be active (i.e., couldn't match)
1286: when in start condition
1287: .B example.
1288: If we just used
1289: .B <example>
1290: to qualify
1291: .I bar,
1292: though, then it would only be active in
1293: .B example
1294: and not in
1295: .B INITIAL,
1296: while in the first example it's active in both, because in the first
1297: example the
1298: .B example
1.10 deraadt 1299: start condition is an
1.1 deraadt 1300: .I inclusive
1301: .B (%s)
1302: start condition.
1303: .PP
1304: Also note that the special start-condition specifier
1305: .B <*>
1306: matches every start condition. Thus, the above example could also
1307: have been written;
1308: .nf
1309:
1310: %x example
1311: %%
1312:
1313: <example>foo do_something();
1314:
1315: <*>bar something_else();
1316:
1317: .fi
1318: .PP
1319: The default rule (to
1320: .B ECHO
1321: any unmatched character) remains active in start conditions. It
1322: is equivalent to:
1323: .nf
1324:
1325: <*>.|\\n ECHO;
1326:
1327: .fi
1328: .PP
1329: .B BEGIN(0)
1330: returns to the original state where only the rules with
1331: no start conditions are active. This state can also be
1332: referred to as the start-condition "INITIAL", so
1333: .B BEGIN(INITIAL)
1334: is equivalent to
1335: .B BEGIN(0).
1336: (The parentheses around the start condition name are not required but
1337: are considered good style.)
1338: .PP
1339: .B BEGIN
1340: actions can also be given as indented code at the beginning
1341: of the rules section. For example, the following will cause
1342: the scanner to enter the "SPECIAL" start condition whenever
1343: .B yylex()
1344: is called and the global variable
1345: .I enter_special
1346: is true:
1347: .nf
1348:
1349: int enter_special;
1350:
1351: %x SPECIAL
1352: %%
1353: if ( enter_special )
1354: BEGIN(SPECIAL);
1355:
1356: <SPECIAL>blahblahblah
1357: ...more rules follow...
1358:
1359: .fi
1360: .PP
1361: To illustrate the uses of start conditions,
1362: here is a scanner which provides two different interpretations
1363: of a string like "123.456". By default it will treat it as
1364: three tokens, the integer "123", a dot ('.'), and the integer "456".
1365: But if the string is preceded earlier in the line by the string
1366: "expect-floats"
1367: it will treat it as a single token, the floating-point number
1368: 123.456:
1369: .nf
1370:
1371: %{
1372: #include <math.h>
1373: %}
1374: %s expect
1375:
1376: %%
1377: expect-floats BEGIN(expect);
1378:
1379: <expect>[0-9]+"."[0-9]+ {
1380: printf( "found a float, = %f\\n",
1381: atof( yytext ) );
1382: }
1383: <expect>\\n {
1384: /* that's the end of the line, so
1385: * we need another "expect-number"
1386: * before we'll recognize any more
1387: * numbers
1388: */
1389: BEGIN(INITIAL);
1390: }
1391:
1392: [0-9]+ {
1393: printf( "found an integer, = %d\\n",
1394: atoi( yytext ) );
1395: }
1396:
1397: "." printf( "found a dot\\n" );
1398:
1399: .fi
1400: Here is a scanner which recognizes (and discards) C comments while
1401: maintaining a count of the current input line.
1402: .nf
1403:
1404: %x comment
1405: %%
1406: int line_num = 1;
1407:
1408: "/*" BEGIN(comment);
1409:
1410: <comment>[^*\\n]* /* eat anything that's not a '*' */
1411: <comment>"*"+[^*/\\n]* /* eat up '*'s not followed by '/'s */
1412: <comment>\\n ++line_num;
1413: <comment>"*"+"/" BEGIN(INITIAL);
1414:
1415: .fi
1416: This scanner goes to a bit of trouble to match as much
1417: text as possible with each rule. In general, when attempting to write
1.10 deraadt 1418: a high-speed scanner try to match as much as possible in each rule, as
1.1 deraadt 1419: it's a big win.
1420: .PP
1.10 deraadt 1421: Note that start-condition names are really integer values and
1.1 deraadt 1422: can be stored as such. Thus, the above could be extended in the
1423: following fashion:
1424: .nf
1425:
1426: %x comment foo
1427: %%
1428: int line_num = 1;
1429: int comment_caller;
1430:
1431: "/*" {
1432: comment_caller = INITIAL;
1433: BEGIN(comment);
1434: }
1435:
1436: ...
1437:
1438: <foo>"/*" {
1439: comment_caller = foo;
1440: BEGIN(comment);
1441: }
1442:
1443: <comment>[^*\\n]* /* eat anything that's not a '*' */
1444: <comment>"*"+[^*/\\n]* /* eat up '*'s not followed by '/'s */
1445: <comment>\\n ++line_num;
1446: <comment>"*"+"/" BEGIN(comment_caller);
1447:
1448: .fi
1449: Furthermore, you can access the current start condition using
1450: the integer-valued
1451: .B YY_START
1452: macro. For example, the above assignments to
1453: .I comment_caller
1454: could instead be written
1455: .nf
1456:
1457: comment_caller = YY_START;
1458:
1459: .fi
1460: Flex provides
1461: .B YYSTATE
1462: as an alias for
1463: .B YY_START
1464: (since that is what's used by AT&T
1465: .I lex).
1466: .PP
1467: Note that start conditions do not have their own name-space; %s's and %x's
1468: declare names in the same fashion as #define's.
1469: .PP
1470: Finally, here's an example of how to match C-style quoted strings using
1471: exclusive start conditions, including expanded escape sequences (but
1472: not including checking for a string that's too long):
1473: .nf
1474:
1475: %x str
1476:
1477: %%
1478: char string_buf[MAX_STR_CONST];
1479: char *string_buf_ptr;
1480:
1481:
1482: \\" string_buf_ptr = string_buf; BEGIN(str);
1483:
1484: <str>\\" { /* saw closing quote - all done */
1485: BEGIN(INITIAL);
1486: *string_buf_ptr = '\\0';
1487: /* return string constant token type and
1488: * value to parser
1489: */
1490: }
1491:
1492: <str>\\n {
1493: /* error - unterminated string constant */
1494: /* generate error message */
1495: }
1496:
1497: <str>\\\\[0-7]{1,3} {
1498: /* octal escape sequence */
1499: int result;
1500:
1501: (void) sscanf( yytext + 1, "%o", &result );
1502:
1503: if ( result > 0xff )
1504: /* error, constant is out-of-bounds */
1505:
1506: *string_buf_ptr++ = result;
1507: }
1508:
1509: <str>\\\\[0-9]+ {
1510: /* generate error - bad escape sequence; something
1511: * like '\\48' or '\\0777777'
1512: */
1513: }
1514:
1515: <str>\\\\n *string_buf_ptr++ = '\\n';
1516: <str>\\\\t *string_buf_ptr++ = '\\t';
1517: <str>\\\\r *string_buf_ptr++ = '\\r';
1518: <str>\\\\b *string_buf_ptr++ = '\\b';
1519: <str>\\\\f *string_buf_ptr++ = '\\f';
1520:
1521: <str>\\\\(.|\\n) *string_buf_ptr++ = yytext[1];
1522:
1523: <str>[^\\\\\\n\\"]+ {
1524: char *yptr = yytext;
1525:
1526: while ( *yptr )
1527: *string_buf_ptr++ = *yptr++;
1528: }
1529:
1530: .fi
1531: .PP
1532: Often, such as in some of the examples above, you wind up writing a
1533: whole bunch of rules all preceded by the same start condition(s). Flex
1534: makes this a little easier and cleaner by introducing a notion of
1535: start condition
1536: .I scope.
1537: A start condition scope is begun with:
1538: .nf
1539:
1540: <SCs>{
1541:
1542: .fi
1543: where
1544: .I SCs
1545: is a list of one or more start conditions. Inside the start condition
1546: scope, every rule automatically has the prefix
1547: .I <SCs>
1548: applied to it, until a
1549: .I '}'
1550: which matches the initial
1551: .I '{'.
1552: So, for example,
1553: .nf
1554:
1555: <ESC>{
1556: "\\\\n" return '\\n';
1557: "\\\\r" return '\\r';
1558: "\\\\f" return '\\f';
1559: "\\\\0" return '\\0';
1560: }
1561:
1562: .fi
1563: is equivalent to:
1564: .nf
1565:
1566: <ESC>"\\\\n" return '\\n';
1567: <ESC>"\\\\r" return '\\r';
1568: <ESC>"\\\\f" return '\\f';
1569: <ESC>"\\\\0" return '\\0';
1570:
1571: .fi
1572: Start condition scopes may be nested.
1573: .PP
1574: Three routines are available for manipulating stacks of start conditions:
1575: .TP
1576: .B void yy_push_state(int new_state)
1577: pushes the current start condition onto the top of the start condition
1578: stack and switches to
1579: .I new_state
1580: as though you had used
1581: .B BEGIN new_state
1582: (recall that start condition names are also integers).
1583: .TP
1584: .B void yy_pop_state()
1585: pops the top of the stack and switches to it via
1586: .B BEGIN.
1587: .TP
1588: .B int yy_top_state()
1589: returns the top of the stack without altering the stack's contents.
1590: .PP
1591: The start condition stack grows dynamically and so has no built-in
1592: size limitation. If memory is exhausted, program execution aborts.
1593: .PP
1594: To use start condition stacks, your scanner must include a
1595: .B %option stack
1596: directive (see Options below).
1597: .SH MULTIPLE INPUT BUFFERS
1598: Some scanners (such as those which support "include" files)
1599: require reading from several input streams. As
1600: .I flex
1601: scanners do a large amount of buffering, one cannot control
1602: where the next input will be read from by simply writing a
1603: .B YY_INPUT
1604: which is sensitive to the scanning context.
1605: .B YY_INPUT
1606: is only called when the scanner reaches the end of its buffer, which
1607: may be a long time after scanning a statement such as an "include"
1608: which requires switching the input source.
1609: .PP
1610: To negotiate these sorts of problems,
1611: .I flex
1612: provides a mechanism for creating and switching between multiple
1613: input buffers. An input buffer is created by using:
1614: .nf
1615:
1616: YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
1617:
1618: .fi
1619: which takes a
1620: .I FILE
1621: pointer and a size and creates a buffer associated with the given
1622: file and large enough to hold
1623: .I size
1624: characters (when in doubt, use
1625: .B YY_BUF_SIZE
1626: for the size). It returns a
1627: .B YY_BUFFER_STATE
1628: handle, which may then be passed to other routines (see below). The
1629: .B YY_BUFFER_STATE
1630: type is a pointer to an opaque
1631: .B struct yy_buffer_state
1632: structure, so you may safely initialize YY_BUFFER_STATE variables to
1633: .B ((YY_BUFFER_STATE) 0)
1634: if you wish, and also refer to the opaque structure in order to
1635: correctly declare input buffers in source files other than that
1636: of your scanner. Note that the
1637: .I FILE
1638: pointer in the call to
1639: .B yy_create_buffer
1640: is only used as the value of
1641: .I yyin
1642: seen by
1643: .B YY_INPUT;
1644: if you redefine
1645: .B YY_INPUT
1646: so it no longer uses
1647: .I yyin,
1648: then you can safely pass a nil
1649: .I FILE
1650: pointer to
1651: .B yy_create_buffer.
1652: You select a particular buffer to scan from using:
1653: .nf
1654:
1655: void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
1656:
1657: .fi
1658: switches the scanner's input buffer so subsequent tokens will
1659: come from
1660: .I new_buffer.
1661: Note that
1662: .B yy_switch_to_buffer()
1663: may be used by yywrap() to set things up for continued scanning, instead
1664: of opening a new file and pointing
1665: .I yyin
1666: at it. Note also that switching input sources via either
1667: .B yy_switch_to_buffer()
1668: or
1669: .B yywrap()
1670: does
1671: .I not
1672: change the start condition.
1673: .nf
1674:
1675: void yy_delete_buffer( YY_BUFFER_STATE buffer )
1676:
1677: .fi
1678: is used to reclaim the storage associated with a buffer. (
1679: .B buffer
1680: can be nil, in which case the routine does nothing.)
1681: You can also clear the current contents of a buffer using:
1682: .nf
1683:
1684: void yy_flush_buffer( YY_BUFFER_STATE buffer )
1685:
1686: .fi
1687: This function discards the buffer's contents,
1688: so the next time the scanner attempts to match a token from the
1689: buffer, it will first fill the buffer anew using
1690: .B YY_INPUT.
1691: .PP
1692: .B yy_new_buffer()
1693: is an alias for
1694: .B yy_create_buffer(),
1695: provided for compatibility with the C++ use of
1696: .I new
1697: and
1698: .I delete
1699: for creating and destroying dynamic objects.
1700: .PP
1701: Finally, the
1702: .B YY_CURRENT_BUFFER
1703: macro returns a
1704: .B YY_BUFFER_STATE
1705: handle to the current buffer.
1706: .PP
1707: Here is an example of using these features for writing a scanner
1708: which expands include files (the
1709: .B <<EOF>>
1710: feature is discussed below):
1711: .nf
1712:
1713: /* the "incl" state is used for picking up the name
1714: * of an include file
1715: */
1716: %x incl
1717:
1718: %{
1719: #define MAX_INCLUDE_DEPTH 10
1720: YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
1721: int include_stack_ptr = 0;
1722: %}
1723:
1724: %%
1725: include BEGIN(incl);
1726:
1727: [a-z]+ ECHO;
1728: [^a-z\\n]*\\n? ECHO;
1729:
1730: <incl>[ \\t]* /* eat the whitespace */
1731: <incl>[^ \\t\\n]+ { /* got the include file name */
1732: if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
1733: {
1734: fprintf( stderr, "Includes nested too deeply" );
1735: exit( 1 );
1736: }
1737:
1738: include_stack[include_stack_ptr++] =
1739: YY_CURRENT_BUFFER;
1740:
1741: yyin = fopen( yytext, "r" );
1742:
1743: if ( ! yyin )
1744: error( ... );
1745:
1746: yy_switch_to_buffer(
1747: yy_create_buffer( yyin, YY_BUF_SIZE ) );
1748:
1749: BEGIN(INITIAL);
1750: }
1751:
1752: <<EOF>> {
1753: if ( --include_stack_ptr < 0 )
1754: {
1755: yyterminate();
1756: }
1757:
1758: else
1759: {
1760: yy_delete_buffer( YY_CURRENT_BUFFER );
1761: yy_switch_to_buffer(
1762: include_stack[include_stack_ptr] );
1763: }
1764: }
1765:
1766: .fi
1767: Three routines are available for setting up input buffers for
1768: scanning in-memory strings instead of files. All of them create
1769: a new input buffer for scanning the string, and return a corresponding
1770: .B YY_BUFFER_STATE
1771: handle (which you should delete with
1772: .B yy_delete_buffer()
1773: when done with it). They also switch to the new buffer using
1774: .B yy_switch_to_buffer(),
1775: so the next call to
1776: .B yylex()
1777: will start scanning the string.
1778: .TP
1779: .B yy_scan_string(const char *str)
1780: scans a NUL-terminated string.
1781: .TP
1782: .B yy_scan_bytes(const char *bytes, int len)
1783: scans
1784: .I len
1785: bytes (including possibly NUL's)
1786: starting at location
1787: .I bytes.
1788: .PP
1789: Note that both of these functions create and scan a
1790: .I copy
1791: of the string or bytes. (This may be desirable, since
1792: .B yylex()
1793: modifies the contents of the buffer it is scanning.) You can avoid the
1794: copy by using:
1795: .TP
1796: .B yy_scan_buffer(char *base, yy_size_t size)
1797: which scans in place the buffer starting at
1798: .I base,
1799: consisting of
1800: .I size
1801: bytes, the last two bytes of which
1802: .I must
1803: be
1804: .B YY_END_OF_BUFFER_CHAR
1805: (ASCII NUL).
1806: These last two bytes are not scanned; thus, scanning
1807: consists of
1808: .B base[0]
1809: through
1810: .B base[size-2],
1811: inclusive.
1812: .IP
1813: If you fail to set up
1814: .I base
1815: in this manner (i.e., forget the final two
1816: .B YY_END_OF_BUFFER_CHAR
1817: bytes), then
1818: .B yy_scan_buffer()
1819: returns a nil pointer instead of creating a new input buffer.
1820: .IP
1821: The type
1822: .B yy_size_t
1823: is an integral type to which you can cast an integer expression
1824: reflecting the size of the buffer.
1825: .SH END-OF-FILE RULES
1826: The special rule "<<EOF>>" indicates
1827: actions which are to be taken when an end-of-file is
1828: encountered and yywrap() returns non-zero (i.e., indicates
1829: no further files to process). The action must finish
1830: by doing one of four things:
1831: .IP -
1832: assigning
1833: .I yyin
1834: to a new input file (in previous versions of flex, after doing the
1835: assignment you had to call the special action
1836: .B YY_NEW_FILE;
1837: this is no longer necessary);
1838: .IP -
1839: executing a
1840: .I return
1841: statement;
1842: .IP -
1843: executing the special
1844: .B yyterminate()
1845: action;
1846: .IP -
1847: or, switching to a new buffer using
1848: .B yy_switch_to_buffer()
1849: as shown in the example above.
1850: .PP
1851: <<EOF>> rules may not be used with other
1852: patterns; they may only be qualified with a list of start
1853: conditions. If an unqualified <<EOF>> rule is given, it
1854: applies to
1855: .I all
1856: start conditions which do not already have <<EOF>> actions. To
1857: specify an <<EOF>> rule for only the initial start condition, use
1858: .nf
1859:
1860: <INITIAL><<EOF>>
1861:
1862: .fi
1863: .PP
1864: These rules are useful for catching things like unclosed comments.
1865: An example:
1866: .nf
1867:
1868: %x quote
1869: %%
1870:
1871: ...other rules for dealing with quotes...
1872:
1873: <quote><<EOF>> {
1874: error( "unterminated quote" );
1875: yyterminate();
1876: }
1877: <<EOF>> {
1878: if ( *++filelist )
1879: yyin = fopen( *filelist, "r" );
1880: else
1881: yyterminate();
1882: }
1883:
1884: .fi
1885: .SH MISCELLANEOUS MACROS
1886: The macro
1887: .B YY_USER_ACTION
1888: can be defined to provide an action
1889: which is always executed prior to the matched rule's action. For example,
1890: it could be #define'd to call a routine to convert yytext to lower-case.
1891: When
1892: .B YY_USER_ACTION
1893: is invoked, the variable
1894: .I yy_act
1895: gives the number of the matched rule (rules are numbered starting with 1).
1896: Suppose you want to profile how often each of your rules is matched. The
1897: following would do the trick:
1898: .nf
1899:
1900: #define YY_USER_ACTION ++ctr[yy_act]
1901:
1902: .fi
1903: where
1904: .I ctr
1905: is an array to hold the counts for the different rules. Note that
1906: the macro
1907: .B YY_NUM_RULES
1908: gives the total number of rules (including the default rule, even if
1909: you use
1910: .B \-s),
1911: so a correct declaration for
1912: .I ctr
1913: is:
1914: .nf
1915:
1916: int ctr[YY_NUM_RULES];
1917:
1918: .fi
1919: .PP
1920: The macro
1921: .B YY_USER_INIT
1922: may be defined to provide an action which is always executed before
1923: the first scan (and before the scanner's internal initializations are done).
1924: For example, it could be used to call a routine to read
1925: in a data table or open a logging file.
1926: .PP
1927: The macro
1928: .B yy_set_interactive(is_interactive)
1929: can be used to control whether the current buffer is considered
1930: .I interactive.
1931: An interactive buffer is processed more slowly,
1932: but must be used when the scanner's input source is indeed
1933: interactive to avoid problems due to waiting to fill buffers
1934: (see the discussion of the
1935: .B \-I
1936: flag below). A non-zero value
1.7 aaron 1937: in the macro invocation marks the buffer as interactive, a zero
1.1 deraadt 1938: value as non-interactive. Note that use of this macro overrides
1939: .B %option always-interactive
1940: or
1941: .B %option never-interactive
1942: (see Options below).
1943: .B yy_set_interactive()
1944: must be invoked prior to beginning to scan the buffer that is
1945: (or is not) to be considered interactive.
1946: .PP
1947: The macro
1948: .B yy_set_bol(at_bol)
1949: can be used to control whether the current buffer's scanning
1950: context for the next token match is done as though at the
1951: beginning of a line. A non-zero macro argument makes rules anchored with
1.10 deraadt 1952: \'^' active, while a zero argument makes '^' rules inactive.
1.1 deraadt 1953: .PP
1954: The macro
1955: .B YY_AT_BOL()
1956: returns true if the next token scanned from the current buffer
1957: will have '^' rules active, false otherwise.
1958: .PP
1959: In the generated scanner, the actions are all gathered in one large
1960: switch statement and separated using
1961: .B YY_BREAK,
1962: which may be redefined. By default, it is simply a "break", to separate
1.10 deraadt 1963: each rule's action from the following rules.
1.1 deraadt 1964: Redefining
1965: .B YY_BREAK
1966: allows, for example, C++ users to
1967: #define YY_BREAK to do nothing (while being very careful that every
1968: rule ends with a "break" or a "return"!) to avoid suffering from
1969: unreachable statement warnings where because a rule's action ends with
1970: "return", the
1971: .B YY_BREAK
1972: is inaccessible.
1973: .SH VALUES AVAILABLE TO THE USER
1974: This section summarizes the various values available to the user
1975: in the rule actions.
1976: .IP -
1977: .B char *yytext
1978: holds the text of the current token. It may be modified but not lengthened
1979: (you cannot append characters to the end).
1980: .IP
1981: If the special directive
1982: .B %array
1983: appears in the first section of the scanner description, then
1984: .B yytext
1985: is instead declared
1986: .B char yytext[YYLMAX],
1987: where
1988: .B YYLMAX
1989: is a macro definition that you can redefine in the first section
1990: if you don't like the default value (generally 8KB). Using
1991: .B %array
1992: results in somewhat slower scanners, but the value of
1993: .B yytext
1994: becomes immune to calls to
1995: .I input()
1996: and
1997: .I unput(),
1998: which potentially destroy its value when
1999: .B yytext
2000: is a character pointer. The opposite of
2001: .B %array
2002: is
2003: .B %pointer,
2004: which is the default.
2005: .IP
2006: You cannot use
2007: .B %array
2008: when generating C++ scanner classes
2009: (the
2010: .B \-+
2011: flag).
2012: .IP -
2013: .B int yyleng
2014: holds the length of the current token.
2015: .IP -
2016: .B FILE *yyin
2017: is the file which by default
2018: .I flex
2019: reads from. It may be redefined but doing so only makes sense before
2020: scanning begins or after an EOF has been encountered. Changing it in
2021: the midst of scanning will have unexpected results since
2022: .I flex
2023: buffers its input; use
2024: .B yyrestart()
2025: instead.
2026: Once scanning terminates because an end-of-file
2027: has been seen, you can assign
2028: .I yyin
2029: at the new input file and then call the scanner again to continue scanning.
2030: .IP -
2031: .B void yyrestart( FILE *new_file )
2032: may be called to point
2033: .I yyin
2034: at the new input file. The switch-over to the new file is immediate
2035: (any previously buffered-up input is lost). Note that calling
2036: .B yyrestart()
2037: with
2038: .I yyin
2039: as an argument thus throws away the current input buffer and continues
2040: scanning the same input file.
2041: .IP -
2042: .B FILE *yyout
2043: is the file to which
2044: .B ECHO
2045: actions are done. It can be reassigned by the user.
2046: .IP -
2047: .B YY_CURRENT_BUFFER
2048: returns a
2049: .B YY_BUFFER_STATE
2050: handle to the current buffer.
2051: .IP -
2052: .B YY_START
2053: returns an integer value corresponding to the current start
2054: condition. You can subsequently use this value with
2055: .B BEGIN
2056: to return to that start condition.
2057: .SH INTERFACING WITH YACC
2058: One of the main uses of
2059: .I flex
2060: is as a companion to the
2061: .I yacc
2062: parser-generator.
2063: .I yacc
2064: parsers expect to call a routine named
2065: .B yylex()
2066: to find the next input token. The routine is supposed to
2067: return the type of the next token as well as putting any associated
2068: value in the global
2069: .B yylval.
2070: To use
2071: .I flex
2072: with
2073: .I yacc,
2074: one specifies the
2075: .B \-d
2076: option to
2077: .I yacc
2078: to instruct it to generate the file
2079: .B y.tab.h
2080: containing definitions of all the
2081: .B %tokens
2082: appearing in the
2083: .I yacc
2084: input. This file is then included in the
2085: .I flex
2086: scanner. For example, if one of the tokens is "TOK_NUMBER",
2087: part of the scanner might look like:
2088: .nf
2089:
2090: %{
2091: #include "y.tab.h"
2092: %}
2093:
2094: %%
2095:
2096: [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;
2097:
2098: .fi
2099: .SH OPTIONS
2100: .I flex
2101: has the following options:
2102: .TP
2103: .B \-b
2104: Generate backing-up information to
2105: .I lex.backup.
2106: This is a list of scanner states which require backing up
2107: and the input characters on which they do so. By adding rules one
2108: can remove backing-up states. If
2109: .I all
2110: backing-up states are eliminated and
2111: .B \-Cf
2112: or
2113: .B \-CF
2114: is used, the generated scanner will run faster (see the
2115: .B \-p
2116: flag). Only users who wish to squeeze every last cycle out of their
2117: scanners need worry about this option. (See the section on Performance
2118: Considerations below.)
2119: .TP
2120: .B \-c
2121: is a do-nothing, deprecated option included for POSIX compliance.
2122: .TP
2123: .B \-d
2124: makes the generated scanner run in
2125: .I debug
2126: mode. Whenever a pattern is recognized and the global
2127: .B yy_flex_debug
2128: is non-zero (which is the default),
2129: the scanner will write to
2130: .I stderr
2131: a line of the form:
2132: .nf
2133:
2134: --accepting rule at line 53 ("the matched text")
2135:
2136: .fi
2137: The line number refers to the location of the rule in the file
2138: defining the scanner (i.e., the file that was fed to flex). Messages
2139: are also generated when the scanner backs up, accepts the
2140: default rule, reaches the end of its input buffer (or encounters
2141: a NUL; at this point, the two look the same as far as the scanner's concerned),
2142: or reaches an end-of-file.
2143: .TP
2144: .B \-f
2145: specifies
2146: .I fast scanner.
2147: No table compression is done and stdio is bypassed.
2148: The result is large but fast. This option is equivalent to
2149: .B \-Cfr
2150: (see below).
2151: .TP
2152: .B \-h
2153: generates a "help" summary of
2154: .I flex's
2155: options to
1.7 aaron 2156: .I stdout
1.1 deraadt 2157: and then exits.
2158: .B \-?
2159: and
2160: .B \-\-help
2161: are synonyms for
2162: .B \-h.
2163: .TP
2164: .B \-i
2165: instructs
2166: .I flex
2167: to generate a
2168: .I case-insensitive
2169: scanner. The case of letters given in the
2170: .I flex
2171: input patterns will
2172: be ignored, and tokens in the input will be matched regardless of case. The
2173: matched text given in
2174: .I yytext
2175: will have the preserved case (i.e., it will not be folded).
2176: .TP
2177: .B \-l
2178: turns on maximum compatibility with the original AT&T
2179: .I lex
2180: implementation. Note that this does not mean
2181: .I full
2182: compatibility. Use of this option costs a considerable amount of
2183: performance, and it cannot be used with the
2184: .B \-+, -f, -F, -Cf,
2185: or
2186: .B -CF
2187: options. For details on the compatibilities it provides, see the section
2188: "Incompatibilities With Lex And POSIX" below. This option also results
2189: in the name
2190: .B YY_FLEX_LEX_COMPAT
2191: being #define'd in the generated scanner.
2192: .TP
2193: .B \-n
2194: is another do-nothing, deprecated option included only for
2195: POSIX compliance.
2196: .TP
2197: .B \-p
2198: generates a performance report to stderr. The report
2199: consists of comments regarding features of the
2200: .I flex
2201: input file which will cause a serious loss of performance in the resulting
2202: scanner. If you give the flag twice, you will also get comments regarding
2203: features that lead to minor performance losses.
2204: .IP
2205: Note that the use of
2206: .B REJECT,
2207: .B %option yylineno,
2208: and variable trailing context (see the Deficiencies / Bugs section below)
2209: entails a substantial performance penalty; use of
2210: .I yymore(),
2211: the
2212: .B ^
2213: operator,
2214: and the
2215: .B \-I
2216: flag entail minor performance penalties.
2217: .TP
2218: .B \-s
2219: causes the
2220: .I default rule
2221: (that unmatched scanner input is echoed to
2222: .I stdout)
2223: to be suppressed. If the scanner encounters input that does not
2224: match any of its rules, it aborts with an error. This option is
2225: useful for finding holes in a scanner's rule set.
2226: .TP
2227: .B \-t
2228: instructs
2229: .I flex
2230: to write the scanner it generates to standard output instead
2231: of
2232: .B lex.yy.c.
2233: .TP
2234: .B \-v
2235: specifies that
2236: .I flex
2237: should write to
2238: .I stderr
2239: a summary of statistics regarding the scanner it generates.
2240: Most of the statistics are meaningless to the casual
2241: .I flex
2242: user, but the first line identifies the version of
2243: .I flex
2244: (same as reported by
2245: .B \-V),
2246: and the next line the flags used when generating the scanner, including
2247: those that are on by default.
2248: .TP
2249: .B \-w
2250: suppresses warning messages.
2251: .TP
2252: .B \-B
2253: instructs
2254: .I flex
2255: to generate a
2256: .I batch
2257: scanner, the opposite of
2258: .I interactive
2259: scanners generated by
2260: .B \-I
2261: (see below). In general, you use
2262: .B \-B
2263: when you are
2264: .I certain
2265: that your scanner will never be used interactively, and you want to
2266: squeeze a
2267: .I little
2268: more performance out of it. If your goal is instead to squeeze out a
2269: .I lot
2270: more performance, you should be using the
2271: .B \-Cf
2272: or
2273: .B \-CF
2274: options (discussed below), which turn on
2275: .B \-B
2276: automatically anyway.
2277: .TP
2278: .B \-F
2279: specifies that the
2280: .ul
2281: fast
2282: scanner table representation should be used (and stdio
2283: bypassed). This representation is
2284: about as fast as the full table representation
2285: .B (-f),
2286: and for some sets of patterns will be considerably smaller (and for
2287: others, larger). In general, if the pattern set contains both "keywords"
2288: and a catch-all, "identifier" rule, such as in the set:
2289: .nf
2290:
2291: "case" return TOK_CASE;
2292: "switch" return TOK_SWITCH;
2293: ...
2294: "default" return TOK_DEFAULT;
2295: [a-z]+ return TOK_ID;
2296:
2297: .fi
2298: then you're better off using the full table representation. If only
2299: the "identifier" rule is present and you then use a hash table or some such
2300: to detect the keywords, you're better off using
2301: .B -F.
2302: .IP
2303: This option is equivalent to
2304: .B \-CFr
2305: (see below). It cannot be used with
2306: .B \-+.
2307: .TP
2308: .B \-I
2309: instructs
2310: .I flex
2311: to generate an
2312: .I interactive
2313: scanner. An interactive scanner is one that only looks ahead to decide
2314: what token has been matched if it absolutely must. It turns out that
2315: always looking one extra character ahead, even if the scanner has already
2316: seen enough text to disambiguate the current token, is a bit faster than
2317: only looking ahead when necessary. But scanners that always look ahead
2318: give dreadful interactive performance; for example, when a user types
2319: a newline, it is not recognized as a newline token until they enter
2320: .I another
2321: token, which often means typing in another whole line.
2322: .IP
2323: .I Flex
2324: scanners default to
2325: .I interactive
2326: unless you use the
2327: .B \-Cf
2328: or
2329: .B \-CF
2330: table-compression options (see below). That's because if you're looking
2331: for high-performance you should be using one of these options, so if you
2332: didn't,
2333: .I flex
2334: assumes you'd rather trade off a bit of run-time performance for intuitive
2335: interactive behavior. Note also that you
2336: .I cannot
2337: use
2338: .B \-I
2339: in conjunction with
2340: .B \-Cf
2341: or
2342: .B \-CF.
2343: Thus, this option is not really needed; it is on by default for all those
2344: cases in which it is allowed.
2345: .IP
2346: You can force a scanner to
2347: .I not
2348: be interactive by using
2349: .B \-B
2350: (see above).
2351: .TP
2352: .B \-L
2353: instructs
2354: .I flex
2355: not to generate
2356: .B #line
2357: directives. Without this option,
2358: .I flex
2359: peppers the generated scanner
2360: with #line directives so error messages in the actions will be correctly
2361: located with respect to either the original
2362: .I flex
2363: input file (if the errors are due to code in the input file), or
2364: .B lex.yy.c
2365: (if the errors are
2366: .I flex's
2367: fault -- you should report these sorts of errors to the email address
2368: given below).
2369: .TP
2370: .B \-T
2371: makes
2372: .I flex
2373: run in
2374: .I trace
2375: mode. It will generate a lot of messages to
2376: .I stderr
2377: concerning
2378: the form of the input and the resultant non-deterministic and deterministic
2379: finite automata. This option is mostly for use in maintaining
2380: .I flex.
2381: .TP
2382: .B \-V
2383: prints the version number to
2384: .I stdout
2385: and exits.
2386: .B \-\-version
2387: is a synonym for
2388: .B \-V.
2389: .TP
2390: .B \-7
2391: instructs
2392: .I flex
2393: to generate a 7-bit scanner, i.e., one which can only recognized 7-bit
2394: characters in its input. The advantage of using
2395: .B \-7
2396: is that the scanner's tables can be up to half the size of those generated
2397: using the
2398: .B \-8
2399: option (see below). The disadvantage is that such scanners often hang
2400: or crash if their input contains an 8-bit character.
2401: .IP
2402: Note, however, that unless you generate your scanner using the
2403: .B \-Cf
2404: or
2405: .B \-CF
2406: table compression options, use of
2407: .B \-7
2408: will save only a small amount of table space, and make your scanner
2409: considerably less portable.
2410: .I Flex's
2411: default behavior is to generate an 8-bit scanner unless you use the
2412: .B \-Cf
2413: or
2414: .B \-CF,
2415: in which case
2416: .I flex
2417: defaults to generating 7-bit scanners unless your site was always
2418: configured to generate 8-bit scanners (as will often be the case
2419: with non-USA sites). You can tell whether flex generated a 7-bit
2420: or an 8-bit scanner by inspecting the flag summary in the
2421: .B \-v
2422: output as described above.
2423: .IP
2424: Note that if you use
2425: .B \-Cfe
2426: or
2427: .B \-CFe
2428: (those table compression options, but also using equivalence classes as
2429: discussed see below), flex still defaults to generating an 8-bit
2430: scanner, since usually with these compression options full 8-bit tables
2431: are not much more expensive than 7-bit tables.
2432: .TP
2433: .B \-8
2434: instructs
2435: .I flex
2436: to generate an 8-bit scanner, i.e., one which can recognize 8-bit
2437: characters. This flag is only needed for scanners generated using
2438: .B \-Cf
2439: or
2440: .B \-CF,
2441: as otherwise flex defaults to generating an 8-bit scanner anyway.
2442: .IP
2443: See the discussion of
2444: .B \-7
2445: above for flex's default behavior and the tradeoffs between 7-bit
2446: and 8-bit scanners.
2447: .TP
2448: .B \-+
2449: specifies that you want flex to generate a C++
2450: scanner class. See the section on Generating C++ Scanners below for
2451: details.
1.7 aaron 2452: .TP
1.1 deraadt 2453: .B \-C[aefFmr]
2454: controls the degree of table compression and, more generally, trade-offs
2455: between small scanners and fast scanners.
2456: .IP
2457: .B \-Ca
2458: ("align") instructs flex to trade off larger tables in the
2459: generated scanner for faster performance because the elements of
2460: the tables are better aligned for memory access and computation. On some
2461: RISC architectures, fetching and manipulating longwords is more efficient
2462: than with smaller-sized units such as shortwords. This option can
2463: double the size of the tables used by your scanner.
2464: .IP
2465: .B \-Ce
2466: directs
2467: .I flex
2468: to construct
2469: .I equivalence classes,
2470: i.e., sets of characters
2471: which have identical lexical properties (for example, if the only
2472: appearance of digits in the
2473: .I flex
2474: input is in the character class
2475: "[0-9]" then the digits '0', '1', ..., '9' will all be put
2476: in the same equivalence class). Equivalence classes usually give
2477: dramatic reductions in the final table/object file sizes (typically
2478: a factor of 2-5) and are pretty cheap performance-wise (one array
2479: look-up per character scanned).
2480: .IP
2481: .B \-Cf
2482: specifies that the
2483: .I full
2484: scanner tables should be generated -
2485: .I flex
2486: should not compress the
1.10 deraadt 2487: tables by taking advantage of similar transition functions for
1.1 deraadt 2488: different states.
2489: .IP
2490: .B \-CF
2491: specifies that the alternate fast scanner representation (described
2492: above under the
2493: .B \-F
2494: flag)
2495: should be used. This option cannot be used with
2496: .B \-+.
2497: .IP
2498: .B \-Cm
2499: directs
2500: .I flex
2501: to construct
2502: .I meta-equivalence classes,
2503: which are sets of equivalence classes (or characters, if equivalence
2504: classes are not being used) that are commonly used together. Meta-equivalence
2505: classes are often a big win when using compressed tables, but they
2506: have a moderate performance impact (one or two "if" tests and one
2507: array look-up per character scanned).
2508: .IP
2509: .B \-Cr
2510: causes the generated scanner to
2511: .I bypass
2512: use of the standard I/O library (stdio) for input. Instead of calling
2513: .B fread()
2514: or
2515: .B getc(),
2516: the scanner will use the
2517: .B read()
2518: system call, resulting in a performance gain which varies from system
2519: to system, but in general is probably negligible unless you are also using
2520: .B \-Cf
2521: or
2522: .B \-CF.
2523: Using
2524: .B \-Cr
2525: can cause strange behavior if, for example, you read from
2526: .I yyin
2527: using stdio prior to calling the scanner (because the scanner will miss
2528: whatever text your previous reads left in the stdio input buffer).
2529: .IP
2530: .B \-Cr
2531: has no effect if you define
2532: .B YY_INPUT
2533: (see The Generated Scanner above).
2534: .IP
2535: A lone
2536: .B \-C
2537: specifies that the scanner tables should be compressed but neither
2538: equivalence classes nor meta-equivalence classes should be used.
2539: .IP
2540: The options
2541: .B \-Cf
2542: or
2543: .B \-CF
2544: and
2545: .B \-Cm
2546: do not make sense together - there is no opportunity for meta-equivalence
2547: classes if the table is not being compressed. Otherwise the options
2548: may be freely mixed, and are cumulative.
2549: .IP
2550: The default setting is
2551: .B \-Cem,
2552: which specifies that
2553: .I flex
2554: should generate equivalence classes
2555: and meta-equivalence classes. This setting provides the highest
2556: degree of table compression. You can trade off
2557: faster-executing scanners at the cost of larger tables with
2558: the following generally being true:
2559: .nf
2560:
2561: slowest & smallest
2562: -Cem
2563: -Cm
2564: -Ce
2565: -C
2566: -C{f,F}e
2567: -C{f,F}
2568: -C{f,F}a
2569: fastest & largest
2570:
2571: .fi
2572: Note that scanners with the smallest tables are usually generated and
2573: compiled the quickest, so
2574: during development you will usually want to use the default, maximal
2575: compression.
2576: .IP
2577: .B \-Cfe
2578: is often a good compromise between speed and size for production
2579: scanners.
2580: .TP
2581: .B \-ooutput
2582: directs flex to write the scanner to the file
2583: .B output
2584: instead of
2585: .B lex.yy.c.
2586: If you combine
2587: .B \-o
2588: with the
2589: .B \-t
2590: option, then the scanner is written to
2591: .I stdout
2592: but its
2593: .B #line
2594: directives (see the
2595: .B \\-L
2596: option above) refer to the file
2597: .B output.
2598: .TP
2599: .B \-Pprefix
2600: changes the default
2601: .I "yy"
2602: prefix used by
2603: .I flex
1.6 aaron 2604: for all globally visible variable and function names to instead be
1.1 deraadt 2605: .I prefix.
2606: For example,
2607: .B \-Pfoo
2608: changes the name of
2609: .B yytext
2610: to
2611: .B footext.
2612: It also changes the name of the default output file from
2613: .B lex.yy.c
2614: to
2615: .B lex.foo.c.
2616: Here are all of the names affected:
2617: .nf
2618:
2619: yy_create_buffer
2620: yy_delete_buffer
2621: yy_flex_debug
2622: yy_init_buffer
2623: yy_flush_buffer
2624: yy_load_buffer_state
2625: yy_switch_to_buffer
2626: yyin
2627: yyleng
2628: yylex
2629: yylineno
2630: yyout
2631: yyrestart
2632: yytext
2633: yywrap
2634:
2635: .fi
2636: (If you are using a C++ scanner, then only
2637: .B yywrap
2638: and
2639: .B yyFlexLexer
2640: are affected.)
2641: Within your scanner itself, you can still refer to the global variables
2642: and functions using either version of their name; but externally, they
2643: have the modified name.
2644: .IP
2645: This option lets you easily link together multiple
2646: .I flex
2647: programs into the same executable. Note, though, that using this
2648: option also renames
2649: .B yywrap(),
2650: so you now
2651: .I must
2652: either
1.6 aaron 2653: provide your own (appropriately named) version of the routine for your
1.1 deraadt 2654: scanner, or use
2655: .B %option noyywrap,
2656: as linking with
2657: .B \-lfl
2658: no longer provides one for you by default.
2659: .TP
2660: .B \-Sskeleton_file
2661: overrides the default skeleton file from which
2662: .I flex
2663: constructs its scanners. You'll never need this option unless you are doing
2664: .I flex
2665: maintenance or development.
2666: .PP
2667: .I flex
2668: also provides a mechanism for controlling options within the
2669: scanner specification itself, rather than from the flex command-line.
2670: This is done by including
2671: .B %option
2672: directives in the first section of the scanner specification.
2673: You can specify multiple options with a single
2674: .B %option
2675: directive, and multiple directives in the first section of your flex input
2676: file.
2677: .PP
2678: Most options are given simply as names, optionally preceded by the
2679: word "no" (with no intervening whitespace) to negate their meaning.
2680: A number are equivalent to flex flags or their negation:
2681: .nf
2682:
2683: 7bit -7 option
2684: 8bit -8 option
2685: align -Ca option
2686: backup -b option
2687: batch -B option
2688: c++ -+ option
2689:
2690: caseful or
2691: case-sensitive opposite of -i (default)
2692:
2693: case-insensitive or
2694: caseless -i option
2695:
2696: debug -d option
2697: default opposite of -s option
2698: ecs -Ce option
2699: fast -F option
2700: full -f option
2701: interactive -I option
2702: lex-compat -l option
2703: meta-ecs -Cm option
2704: perf-report -p option
2705: read -Cr option
2706: stdout -t option
2707: verbose -v option
2708: warn opposite of -w option
2709: (use "%option nowarn" for -w)
2710:
2711: array equivalent to "%array"
2712: pointer equivalent to "%pointer" (default)
2713:
2714: .fi
2715: Some
2716: .B %option's
2717: provide features otherwise not available:
2718: .TP
2719: .B always-interactive
2720: instructs flex to generate a scanner which always considers its input
2721: "interactive". Normally, on each new input file the scanner calls
2722: .B isatty()
2723: in an attempt to determine whether
2724: the scanner's input source is interactive and thus should be read a
2725: character at a time. When this option is used, however, then no
2726: such call is made.
2727: .TP
2728: .B main
2729: directs flex to provide a default
2730: .B main()
2731: program for the scanner, which simply calls
2732: .B yylex().
2733: This option implies
2734: .B noyywrap
2735: (see below).
2736: .TP
2737: .B never-interactive
2738: instructs flex to generate a scanner which never considers its input
2739: "interactive" (again, no call made to
2740: .B isatty()).
2741: This is the opposite of
2742: .B always-interactive.
2743: .TP
2744: .B stack
2745: enables the use of start condition stacks (see Start Conditions above).
2746: .TP
2747: .B stdinit
2748: if set (i.e.,
2749: .B %option stdinit)
2750: initializes
2751: .I yyin
2752: and
2753: .I yyout
2754: to
2755: .I stdin
2756: and
2757: .I stdout,
2758: instead of the default of
2759: .I nil.
2760: Some existing
2761: .I lex
2762: programs depend on this behavior, even though it is not compliant with
2763: ANSI C, which does not require
2764: .I stdin
2765: and
2766: .I stdout
2767: to be compile-time constant.
2768: .TP
2769: .B yylineno
2770: directs
2771: .I flex
2772: to generate a scanner that maintains the number of the current line
2773: read from its input in the global variable
2774: .B yylineno.
2775: This option is implied by
2776: .B %option lex-compat.
2777: .TP
2778: .B yywrap
2779: if unset (i.e.,
2780: .B %option noyywrap),
2781: makes the scanner not call
2782: .B yywrap()
2783: upon an end-of-file, but simply assume that there are no more
2784: files to scan (until the user points
2785: .I yyin
2786: at a new file and calls
2787: .B yylex()
2788: again).
2789: .PP
2790: .I flex
2791: scans your rule actions to determine whether you use the
2792: .B REJECT
2793: or
2794: .B yymore()
2795: features. The
2796: .B reject
2797: and
2798: .B yymore
2799: options are available to override its decision as to whether you use the
2800: options, either by setting them (e.g.,
2801: .B %option reject)
2802: to indicate the feature is indeed used, or
2803: unsetting them to indicate it actually is not used
2804: (e.g.,
2805: .B %option noyymore).
2806: .PP
2807: Three options take string-delimited values, offset with '=':
2808: .nf
2809:
2810: %option outfile="ABC"
2811:
2812: .fi
2813: is equivalent to
2814: .B -oABC,
2815: and
2816: .nf
2817:
2818: %option prefix="XYZ"
2819:
2820: .fi
2821: is equivalent to
2822: .B -PXYZ.
2823: Finally,
2824: .nf
2825:
2826: %option yyclass="foo"
2827:
2828: .fi
2829: only applies when generating a C++ scanner (
2830: .B \-+
2831: option). It informs
2832: .I flex
2833: that you have derived
2834: .B foo
2835: as a subclass of
2836: .B yyFlexLexer,
2837: so
2838: .I flex
2839: will place your actions in the member function
2840: .B foo::yylex()
2841: instead of
2842: .B yyFlexLexer::yylex().
2843: It also generates a
2844: .B yyFlexLexer::yylex()
2845: member function that emits a run-time error (by invoking
2846: .B yyFlexLexer::LexerError())
2847: if called.
2848: See Generating C++ Scanners, below, for additional information.
2849: .PP
2850: A number of options are available for lint purists who want to suppress
2851: the appearance of unneeded routines in the generated scanner. Each of the
2852: following, if unset
2853: (e.g.,
2854: .B %option nounput
2855: ), results in the corresponding routine not appearing in
2856: the generated scanner:
2857: .nf
2858:
2859: input, unput
2860: yy_push_state, yy_pop_state, yy_top_state
2861: yy_scan_buffer, yy_scan_bytes, yy_scan_string
2862:
2863: .fi
2864: (though
2865: .B yy_push_state()
2866: and friends won't appear anyway unless you use
2867: .B %option stack).
2868: .SH PERFORMANCE CONSIDERATIONS
2869: The main design goal of
2870: .I flex
2871: is that it generate high-performance scanners. It has been optimized
2872: for dealing well with large sets of rules. Aside from the effects on
2873: scanner speed of the table compression
2874: .B \-C
2875: options outlined above,
2876: there are a number of options/actions which degrade performance. These
2877: are, from most expensive to least:
2878: .nf
2879:
2880: REJECT
2881: %option yylineno
2882: arbitrary trailing context
2883:
2884: pattern sets that require backing up
2885: %array
2886: %option interactive
2887: %option always-interactive
2888:
2889: '^' beginning-of-line operator
2890: yymore()
2891:
2892: .fi
2893: with the first three all being quite expensive and the last two
2894: being quite cheap. Note also that
2895: .B unput()
2896: is implemented as a routine call that potentially does quite a bit of
2897: work, while
2898: .B yyless()
2899: is a quite-cheap macro; so if just putting back some excess text you
2900: scanned, use
2901: .B yyless().
2902: .PP
2903: .B REJECT
2904: should be avoided at all costs when performance is important.
2905: It is a particularly expensive option.
2906: .PP
2907: Getting rid of backing up is messy and often may be an enormous
2908: amount of work for a complicated scanner. In principal, one begins
2909: by using the
1.7 aaron 2910: .B \-b
1.1 deraadt 2911: flag to generate a
2912: .I lex.backup
2913: file. For example, on the input
2914: .nf
2915:
2916: %%
2917: foo return TOK_KEYWORD;
2918: foobar return TOK_KEYWORD;
2919:
2920: .fi
2921: the file looks like:
2922: .nf
2923:
2924: State #6 is non-accepting -
2925: associated rule line numbers:
2926: 2 3
2927: out-transitions: [ o ]
2928: jam-transitions: EOF [ \\001-n p-\\177 ]
2929:
2930: State #8 is non-accepting -
2931: associated rule line numbers:
2932: 3
2933: out-transitions: [ a ]
2934: jam-transitions: EOF [ \\001-` b-\\177 ]
2935:
2936: State #9 is non-accepting -
2937: associated rule line numbers:
2938: 3
2939: out-transitions: [ r ]
2940: jam-transitions: EOF [ \\001-q s-\\177 ]
2941:
2942: Compressed tables always back up.
2943:
2944: .fi
2945: The first few lines tell us that there's a scanner state in
2946: which it can make a transition on an 'o' but not on any other
2947: character, and that in that state the currently scanned text does not match
2948: any rule. The state occurs when trying to match the rules found
2949: at lines 2 and 3 in the input file.
2950: If the scanner is in that state and then reads
2951: something other than an 'o', it will have to back up to find
2952: a rule which is matched. With
2953: a bit of headscratching one can see that this must be the
2954: state it's in when it has seen "fo". When this has happened,
2955: if anything other than another 'o' is seen, the scanner will
2956: have to back up to simply match the 'f' (by the default rule).
2957: .PP
2958: The comment regarding State #8 indicates there's a problem
2959: when "foob" has been scanned. Indeed, on any character other
2960: than an 'a', the scanner will have to back up to accept "foo".
2961: Similarly, the comment for State #9 concerns when "fooba" has
2962: been scanned and an 'r' does not follow.
2963: .PP
2964: The final comment reminds us that there's no point going to
2965: all the trouble of removing backing up from the rules unless
2966: we're using
2967: .B \-Cf
2968: or
2969: .B \-CF,
2970: since there's no performance gain doing so with compressed scanners.
2971: .PP
2972: The way to remove the backing up is to add "error" rules:
2973: .nf
2974:
2975: %%
2976: foo return TOK_KEYWORD;
2977: foobar return TOK_KEYWORD;
2978:
2979: fooba |
2980: foob |
2981: fo {
2982: /* false alarm, not really a keyword */
2983: return TOK_ID;
2984: }
2985:
2986: .fi
2987: .PP
2988: Eliminating backing up among a list of keywords can also be
2989: done using a "catch-all" rule:
2990: .nf
2991:
2992: %%
2993: foo return TOK_KEYWORD;
2994: foobar return TOK_KEYWORD;
2995:
2996: [a-z]+ return TOK_ID;
2997:
2998: .fi
2999: This is usually the best solution when appropriate.
3000: .PP
3001: Backing up messages tend to cascade.
3002: With a complicated set of rules it's not uncommon to get hundreds
3003: of messages. If one can decipher them, though, it often
3004: only takes a dozen or so rules to eliminate the backing up (though
3005: it's easy to make a mistake and have an error rule accidentally match
3006: a valid token. A possible future
3007: .I flex
3008: feature will be to automatically add rules to eliminate backing up).
3009: .PP
3010: It's important to keep in mind that you gain the benefits of eliminating
3011: backing up only if you eliminate
3012: .I every
3013: instance of backing up. Leaving just one means you gain nothing.
3014: .PP
3015: .I Variable
3016: trailing context (where both the leading and trailing parts do not have
3017: a fixed length) entails almost the same performance loss as
3018: .B REJECT
3019: (i.e., substantial). So when possible a rule like:
3020: .nf
3021:
3022: %%
3023: mouse|rat/(cat|dog) run();
3024:
3025: .fi
3026: is better written:
3027: .nf
3028:
3029: %%
3030: mouse/cat|dog run();
3031: rat/cat|dog run();
3032:
3033: .fi
3034: or as
3035: .nf
3036:
3037: %%
3038: mouse|rat/cat run();
3039: mouse|rat/dog run();
3040:
3041: .fi
3042: Note that here the special '|' action does
3043: .I not
3044: provide any savings, and can even make things worse (see
3045: Deficiencies / Bugs below).
3046: .LP
3047: Another area where the user can increase a scanner's performance
3048: (and one that's easier to implement) arises from the fact that
3049: the longer the tokens matched, the faster the scanner will run.
3050: This is because with long tokens the processing of most input
3051: characters takes place in the (short) inner scanning loop, and
3052: does not often have to go through the additional work of setting up
3053: the scanning environment (e.g.,
3054: .B yytext)
3055: for the action. Recall the scanner for C comments:
3056: .nf
3057:
3058: %x comment
3059: %%
3060: int line_num = 1;
3061:
3062: "/*" BEGIN(comment);
3063:
3064: <comment>[^*\\n]*
3065: <comment>"*"+[^*/\\n]*
3066: <comment>\\n ++line_num;
3067: <comment>"*"+"/" BEGIN(INITIAL);
3068:
3069: .fi
3070: This could be sped up by writing it as:
3071: .nf
3072:
3073: %x comment
3074: %%
3075: int line_num = 1;
3076:
3077: "/*" BEGIN(comment);
3078:
3079: <comment>[^*\\n]*
3080: <comment>[^*\\n]*\\n ++line_num;
3081: <comment>"*"+[^*/\\n]*
3082: <comment>"*"+[^*/\\n]*\\n ++line_num;
3083: <comment>"*"+"/" BEGIN(INITIAL);
3084:
3085: .fi
3086: Now instead of each newline requiring the processing of another
3087: action, recognizing the newlines is "distributed" over the other rules
3088: to keep the matched text as long as possible. Note that
3089: .I adding
3090: rules does
3091: .I not
3092: slow down the scanner! The speed of the scanner is independent
3093: of the number of rules or (modulo the considerations given at the
3094: beginning of this section) how complicated the rules are with
3095: regard to operators such as '*' and '|'.
3096: .PP
3097: A final example in speeding up a scanner: suppose you want to scan
3098: through a file containing identifiers and keywords, one per line
3099: and with no other extraneous characters, and recognize all the
3100: keywords. A natural first approach is:
3101: .nf
3102:
3103: %%
3104: asm |
3105: auto |
3106: break |
3107: ... etc ...
3108: volatile |
3109: while /* it's a keyword */
3110:
3111: .|\\n /* it's not a keyword */
3112:
3113: .fi
3114: To eliminate the back-tracking, introduce a catch-all rule:
3115: .nf
3116:
3117: %%
3118: asm |
3119: auto |
3120: break |
3121: ... etc ...
3122: volatile |
3123: while /* it's a keyword */
3124:
3125: [a-z]+ |
3126: .|\\n /* it's not a keyword */
3127:
3128: .fi
3129: Now, if it's guaranteed that there's exactly one word per line,
3130: then we can reduce the total number of matches by a half by
3131: merging in the recognition of newlines with that of the other
3132: tokens:
3133: .nf
3134:
3135: %%
3136: asm\\n |
3137: auto\\n |
3138: break\\n |
3139: ... etc ...
3140: volatile\\n |
3141: while\\n /* it's a keyword */
3142:
3143: [a-z]+\\n |
3144: .|\\n /* it's not a keyword */
3145:
3146: .fi
3147: One has to be careful here, as we have now reintroduced backing up
3148: into the scanner. In particular, while
3149: .I we
3150: know that there will never be any characters in the input stream
3151: other than letters or newlines,
3152: .I flex
3153: can't figure this out, and it will plan for possibly needing to back up
3154: when it has scanned a token like "auto" and then the next character
3155: is something other than a newline or a letter. Previously it would
3156: then just match the "auto" rule and be done, but now it has no "auto"
1.10 deraadt 3157: rule, only an "auto\\n" rule. To eliminate the possibility of backing up,
1.1 deraadt 3158: we could either duplicate all rules but without final newlines, or,
3159: since we never expect to encounter such an input and therefore don't
3160: how it's classified, we can introduce one more catch-all rule, this
3161: one which doesn't include a newline:
3162: .nf
3163:
3164: %%
3165: asm\\n |
3166: auto\\n |
3167: break\\n |
3168: ... etc ...
3169: volatile\\n |
3170: while\\n /* it's a keyword */
3171:
3172: [a-z]+\\n |
3173: [a-z]+ |
3174: .|\\n /* it's not a keyword */
3175:
3176: .fi
3177: Compiled with
3178: .B \-Cf,
3179: this is about as fast as one can get a
1.7 aaron 3180: .I flex
1.1 deraadt 3181: scanner to go for this particular problem.
3182: .PP
3183: A final note:
3184: .I flex
3185: is slow when matching NUL's, particularly when a token contains
3186: multiple NUL's.
3187: It's best to write rules which match
3188: .I short
3189: amounts of text if it's anticipated that the text will often include NUL's.
3190: .PP
3191: Another final note regarding performance: as mentioned above in the section
3192: How the Input is Matched, dynamically resizing
3193: .B yytext
3194: to accommodate huge tokens is a slow process because it presently requires that
3195: the (huge) token be rescanned from the beginning. Thus if performance is
3196: vital, you should attempt to match "large" quantities of text but not
3197: "huge" quantities, where the cutoff between the two is at about 8K
3198: characters/token.
3199: .SH GENERATING C++ SCANNERS
3200: .I flex
3201: provides two different ways to generate scanners for use with C++. The
3202: first way is to simply compile a scanner generated by
3203: .I flex
3204: using a C++ compiler instead of a C compiler. You should not encounter
1.10 deraadt 3205: any compilation errors (please report any you find to the email address
1.1 deraadt 3206: given in the Author section below). You can then use C++ code in your
3207: rule actions instead of C code. Note that the default input source for
3208: your scanner remains
3209: .I yyin,
3210: and default echoing is still done to
3211: .I yyout.
3212: Both of these remain
3213: .I FILE *
3214: variables and not C++
3215: .I streams.
3216: .PP
3217: You can also use
3218: .I flex
3219: to generate a C++ scanner class, using the
3220: .B \-+
3221: option (or, equivalently,
3222: .B %option c++),
3223: which is automatically specified if the name of the flex
3224: executable ends in a '+', such as
3225: .I flex++.
3226: When using this option, flex defaults to generating the scanner to the file
3227: .B lex.yy.cc
3228: instead of
3229: .B lex.yy.c.
3230: The generated scanner includes the header file
1.5 deraadt 3231: .I g++/FlexLexer.h,
1.1 deraadt 3232: which defines the interface to two C++ classes.
3233: .PP
3234: The first class,
3235: .B FlexLexer,
3236: provides an abstract base class defining the general scanner class
3237: interface. It provides the following member functions:
3238: .TP
3239: .B const char* YYText()
3240: returns the text of the most recently matched token, the equivalent of
3241: .B yytext.
3242: .TP
3243: .B int YYLeng()
3244: returns the length of the most recently matched token, the equivalent of
3245: .B yyleng.
3246: .TP
3247: .B int lineno() const
3248: returns the current input line number
3249: (see
3250: .B %option yylineno),
3251: or
3252: .B 1
3253: if
3254: .B %option yylineno
3255: was not used.
3256: .TP
3257: .B void set_debug( int flag )
3258: sets the debugging flag for the scanner, equivalent to assigning to
3259: .B yy_flex_debug
3260: (see the Options section above). Note that you must build the scanner
3261: using
3262: .B %option debug
3263: to include debugging information in it.
3264: .TP
3265: .B int debug() const
3266: returns the current setting of the debugging flag.
3267: .PP
3268: Also provided are member functions equivalent to
3269: .B yy_switch_to_buffer(),
3270: .B yy_create_buffer()
3271: (though the first argument is an
3272: .B istream*
3273: object pointer and not a
3274: .B FILE*),
3275: .B yy_flush_buffer(),
3276: .B yy_delete_buffer(),
3277: and
3278: .B yyrestart()
1.10 deraadt 3279: (again, the first argument is an
1.1 deraadt 3280: .B istream*
3281: object pointer).
3282: .PP
3283: The second class defined in
1.5 deraadt 3284: .I g++/FlexLexer.h
1.1 deraadt 3285: is
3286: .B yyFlexLexer,
3287: which is derived from
3288: .B FlexLexer.
3289: It defines the following additional member functions:
3290: .TP
3291: .B
3292: yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
3293: constructs a
3294: .B yyFlexLexer
3295: object using the given streams for input and output. If not specified,
3296: the streams default to
3297: .B cin
3298: and
3299: .B cout,
3300: respectively.
3301: .TP
3302: .B virtual int yylex()
1.10 deraadt 3303: performs the same role as
1.1 deraadt 3304: .B yylex()
3305: does for ordinary flex scanners: it scans the input stream, consuming
3306: tokens, until a rule's action returns a value. If you derive a subclass
3307: .B S
3308: from
3309: .B yyFlexLexer
3310: and want to access the member functions and variables of
3311: .B S
3312: inside
3313: .B yylex(),
3314: then you need to use
3315: .B %option yyclass="S"
3316: to inform
3317: .I flex
3318: that you will be using that subclass instead of
3319: .B yyFlexLexer.
3320: In this case, rather than generating
3321: .B yyFlexLexer::yylex(),
3322: .I flex
3323: generates
3324: .B S::yylex()
3325: (and also generates a dummy
3326: .B yyFlexLexer::yylex()
3327: that calls
3328: .B yyFlexLexer::LexerError()
3329: if called).
3330: .TP
3331: .B
3332: virtual void switch_streams(istream* new_in = 0,
3333: .B
3334: ostream* new_out = 0)
3335: reassigns
3336: .B yyin
3337: to
3338: .B new_in
3339: (if non-nil)
3340: and
3341: .B yyout
3342: to
3343: .B new_out
3344: (ditto), deleting the previous input buffer if
3345: .B yyin
3346: is reassigned.
3347: .TP
3348: .B
3349: int yylex( istream* new_in, ostream* new_out = 0 )
3350: first switches the input streams via
3351: .B switch_streams( new_in, new_out )
3352: and then returns the value of
3353: .B yylex().
3354: .PP
3355: In addition,
3356: .B yyFlexLexer
3357: defines the following protected virtual functions which you can redefine
3358: in derived classes to tailor the scanner:
3359: .TP
3360: .B
3361: virtual int LexerInput( char* buf, int max_size )
3362: reads up to
3363: .B max_size
3364: characters into
3365: .B buf
3366: and returns the number of characters read. To indicate end-of-input,
3367: return 0 characters. Note that "interactive" scanners (see the
3368: .B \-B
3369: and
3370: .B \-I
3371: flags) define the macro
3372: .B YY_INTERACTIVE.
3373: If you redefine
3374: .B LexerInput()
3375: and need to take different actions depending on whether or not
3376: the scanner might be scanning an interactive input source, you can
3377: test for the presence of this name via
3378: .B #ifdef.
3379: .TP
3380: .B
3381: virtual void LexerOutput( const char* buf, int size )
3382: writes out
3383: .B size
3384: characters from the buffer
3385: .B buf,
3386: which, while NUL-terminated, may also contain "internal" NUL's if
3387: the scanner's rules can match text with NUL's in them.
3388: .TP
3389: .B
3390: virtual void LexerError( const char* msg )
3391: reports a fatal error message. The default version of this function
3392: writes the message to the stream
3393: .B cerr
3394: and exits.
3395: .PP
3396: Note that a
3397: .B yyFlexLexer
3398: object contains its
3399: .I entire
3400: scanning state. Thus you can use such objects to create reentrant
3401: scanners. You can instantiate multiple instances of the same
3402: .B yyFlexLexer
3403: class, and you can also combine multiple C++ scanner classes together
3404: in the same program using the
3405: .B \-P
3406: option discussed above.
3407: .PP
3408: Finally, note that the
3409: .B %array
3410: feature is not available to C++ scanner classes; you must use
3411: .B %pointer
3412: (the default).
3413: .PP
3414: Here is an example of a simple C++ scanner:
3415: .nf
3416:
3417: // An example of using the flex C++ scanner class.
3418:
3419: %{
3420: int mylineno = 0;
3421: %}
3422:
3423: string \\"[^\\n"]+\\"
3424:
3425: ws [ \\t]+
3426:
3427: alpha [A-Za-z]
3428: dig [0-9]
3429: name ({alpha}|{dig}|\\$)({alpha}|{dig}|[_.\\-/$])*
3430: num1 [-+]?{dig}+\\.?([eE][-+]?{dig}+)?
3431: num2 [-+]?{dig}*\\.{dig}+([eE][-+]?{dig}+)?
3432: number {num1}|{num2}
3433:
3434: %%
3435:
3436: {ws} /* skip blanks and tabs */
3437:
3438: "/*" {
3439: int c;
3440:
3441: while((c = yyinput()) != 0)
3442: {
3443: if(c == '\\n')
3444: ++mylineno;
3445:
3446: else if(c == '*')
3447: {
3448: if((c = yyinput()) == '/')
3449: break;
3450: else
3451: unput(c);
3452: }
3453: }
3454: }
3455:
3456: {number} cout << "number " << YYText() << '\\n';
3457:
3458: \\n mylineno++;
3459:
3460: {name} cout << "name " << YYText() << '\\n';
3461:
3462: {string} cout << "string " << YYText() << '\\n';
3463:
3464: %%
3465:
3466: int main( int /* argc */, char** /* argv */ )
3467: {
3468: FlexLexer* lexer = new yyFlexLexer;
3469: while(lexer->yylex() != 0)
3470: ;
3471: return 0;
3472: }
3473: .fi
3474: If you want to create multiple (different) lexer classes, you use the
3475: .B \-P
3476: flag (or the
3477: .B prefix=
3478: option) to rename each
3479: .B yyFlexLexer
3480: to some other
3481: .B xxFlexLexer.
3482: You then can include
1.5 deraadt 3483: .B <g++/FlexLexer.h>
1.1 deraadt 3484: in your other sources once per lexer class, first renaming
3485: .B yyFlexLexer
3486: as follows:
3487: .nf
3488:
3489: #undef yyFlexLexer
3490: #define yyFlexLexer xxFlexLexer
1.5 deraadt 3491: #include <g++/FlexLexer.h>
1.1 deraadt 3492:
3493: #undef yyFlexLexer
3494: #define yyFlexLexer zzFlexLexer
1.5 deraadt 3495: #include <g++/FlexLexer.h>
1.1 deraadt 3496:
3497: .fi
3498: if, for example, you used
3499: .B %option prefix="xx"
3500: for one of your scanners and
3501: .B %option prefix="zz"
3502: for the other.
3503: .PP
3504: IMPORTANT: the present form of the scanning class is
3505: .I experimental
1.7 aaron 3506: and may change considerably between major releases.
1.1 deraadt 3507: .SH INCOMPATIBILITIES WITH LEX AND POSIX
3508: .I flex
3509: is a rewrite of the AT&T Unix
3510: .I lex
3511: tool (the two implementations do not share any code, though),
3512: with some extensions and incompatibilities, both of which
3513: are of concern to those who wish to write scanners acceptable
3514: to either implementation. Flex is fully compliant with the POSIX
3515: .I lex
3516: specification, except that when using
3517: .B %pointer
3518: (the default), a call to
3519: .B unput()
3520: destroys the contents of
3521: .B yytext,
3522: which is counter to the POSIX specification.
3523: .PP
3524: In this section we discuss all of the known areas of incompatibility
3525: between flex, AT&T lex, and the POSIX specification.
3526: .PP
3527: .I flex's
3528: .B \-l
3529: option turns on maximum compatibility with the original AT&T
3530: .I lex
3531: implementation, at the cost of a major loss in the generated scanner's
3532: performance. We note below which incompatibilities can be overcome
3533: using the
3534: .B \-l
3535: option.
3536: .PP
3537: .I flex
3538: is fully compatible with
3539: .I lex
3540: with the following exceptions:
3541: .IP -
3542: The undocumented
3543: .I lex
3544: scanner internal variable
3545: .B yylineno
3546: is not supported unless
3547: .B \-l
3548: or
3549: .B %option yylineno
3550: is used.
3551: .IP
3552: .B yylineno
3553: should be maintained on a per-buffer basis, rather than a per-scanner
3554: (single global variable) basis.
3555: .IP
3556: .B yylineno
3557: is not part of the POSIX specification.
3558: .IP -
3559: The
3560: .B input()
3561: routine is not redefinable, though it may be called to read characters
3562: following whatever has been matched by a rule. If
3563: .B input()
3564: encounters an end-of-file the normal
3565: .B yywrap()
3566: processing is done. A ``real'' end-of-file is returned by
3567: .B input()
3568: as
3569: .I EOF.
3570: .IP
3571: Input is instead controlled by defining the
3572: .B YY_INPUT
3573: macro.
3574: .IP
3575: The
3576: .I flex
3577: restriction that
3578: .B input()
3579: cannot be redefined is in accordance with the POSIX specification,
3580: which simply does not specify any way of controlling the
3581: scanner's input other than by making an initial assignment to
3582: .I yyin.
3583: .IP -
3584: The
3585: .B unput()
3586: routine is not redefinable. This restriction is in accordance with POSIX.
3587: .IP -
3588: .I flex
3589: scanners are not as reentrant as
3590: .I lex
3591: scanners. In particular, if you have an interactive scanner and
3592: an interrupt handler which long-jumps out of the scanner, and
3593: the scanner is subsequently called again, you may get the following
3594: message:
3595: .nf
3596:
3597: fatal flex scanner internal error--end of buffer missed
3598:
3599: .fi
3600: To reenter the scanner, first use
3601: .nf
3602:
3603: yyrestart( yyin );
3604:
3605: .fi
3606: Note that this call will throw away any buffered input; usually this
3607: isn't a problem with an interactive scanner.
3608: .IP
3609: Also note that flex C++ scanner classes
3610: .I are
3611: reentrant, so if using C++ is an option for you, you should use
3612: them instead. See "Generating C++ Scanners" above for details.
3613: .IP -
3614: .B output()
3615: is not supported.
3616: Output from the
3617: .B ECHO
3618: macro is done to the file-pointer
3619: .I yyout
3620: (default
3621: .I stdout).
3622: .IP
3623: .B output()
3624: is not part of the POSIX specification.
3625: .IP -
3626: .I lex
3627: does not support exclusive start conditions (%x), though they
3628: are in the POSIX specification.
3629: .IP -
3630: When definitions are expanded,
3631: .I flex
3632: encloses them in parentheses.
3633: With lex, the following:
3634: .nf
3635:
3636: NAME [A-Z][A-Z0-9]*
3637: %%
3638: foo{NAME}? printf( "Found it\\n" );
3639: %%
3640:
3641: .fi
3642: will not match the string "foo" because when the macro
3643: is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
3644: and the precedence is such that the '?' is associated with
3645: "[A-Z0-9]*". With
3646: .I flex,
3647: the rule will be expanded to
3648: "foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
3649: .IP
3650: Note that if the definition begins with
3651: .B ^
3652: or ends with
3653: .B $
3654: then it is
3655: .I not
3656: expanded with parentheses, to allow these operators to appear in
3657: definitions without losing their special meanings. But the
3658: .B <s>, /,
3659: and
3660: .B <<EOF>>
3661: operators cannot be used in a
3662: .I flex
3663: definition.
3664: .IP
3665: Using
3666: .B \-l
3667: results in the
3668: .I lex
3669: behavior of no parentheses around the definition.
3670: .IP
3671: The POSIX specification is that the definition be enclosed in parentheses.
3672: .IP -
3673: Some implementations of
3674: .I lex
3675: allow a rule's action to begin on a separate line, if the rule's pattern
3676: has trailing whitespace:
3677: .nf
3678:
3679: %%
3680: foo|bar<space here>
3681: { foobar_action(); }
3682:
3683: .fi
3684: .I flex
3685: does not support this feature.
3686: .IP -
3687: The
3688: .I lex
3689: .B %r
3690: (generate a Ratfor scanner) option is not supported. It is not part
3691: of the POSIX specification.
3692: .IP -
3693: After a call to
3694: .B unput(),
3695: .I yytext
3696: is undefined until the next token is matched, unless the scanner
3697: was built using
3698: .B %array.
3699: This is not the case with
3700: .I lex
3701: or the POSIX specification. The
3702: .B \-l
3703: option does away with this incompatibility.
3704: .IP -
3705: The precedence of the
3706: .B {}
3707: (numeric range) operator is different.
3708: .I lex
3709: interprets "abc{1,3}" as "match one, two, or
3710: three occurrences of 'abc'", whereas
3711: .I flex
3712: interprets it as "match 'ab'
3713: followed by one, two, or three occurrences of 'c'". The latter is
3714: in agreement with the POSIX specification.
3715: .IP -
3716: The precedence of the
3717: .B ^
3718: operator is different.
3719: .I lex
3720: interprets "^foo|bar" as "match either 'foo' at the beginning of a line,
3721: or 'bar' anywhere", whereas
3722: .I flex
3723: interprets it as "match either 'foo' or 'bar' if they come at the beginning
3724: of a line". The latter is in agreement with the POSIX specification.
3725: .IP -
3726: The special table-size declarations such as
3727: .B %a
3728: supported by
3729: .I lex
3730: are not required by
3731: .I flex
3732: scanners;
3733: .I flex
3734: ignores them.
3735: .IP -
3736: The name
3737: .bd
3738: FLEX_SCANNER
3739: is #define'd so scanners may be written for use with either
3740: .I flex
3741: or
3742: .I lex.
3743: Scanners also include
3744: .B YY_FLEX_MAJOR_VERSION
3745: and
3746: .B YY_FLEX_MINOR_VERSION
3747: indicating which version of
3748: .I flex
3749: generated the scanner
3750: (for example, for the 2.5 release, these defines would be 2 and 5
3751: respectively).
3752: .PP
3753: The following
3754: .I flex
3755: features are not included in
3756: .I lex
3757: or the POSIX specification:
3758: .nf
3759:
3760: C++ scanners
3761: %option
3762: start condition scopes
3763: start condition stacks
3764: interactive/non-interactive scanners
3765: yy_scan_string() and friends
3766: yyterminate()
3767: yy_set_interactive()
3768: yy_set_bol()
3769: YY_AT_BOL()
3770: <<EOF>>
3771: <*>
3772: YY_DECL
3773: YY_START
3774: YY_USER_ACTION
3775: YY_USER_INIT
3776: #line directives
3777: %{}'s around actions
3778: multiple actions on a line
3779:
3780: .fi
3781: plus almost all of the flex flags.
3782: The last feature in the list refers to the fact that with
3783: .I flex
3784: you can put multiple actions on the same line, separated with
3785: semi-colons, while with
3786: .I lex,
3787: the following
3788: .nf
3789:
3790: foo handle_foo(); ++num_foos_seen;
3791:
3792: .fi
3793: is (rather surprisingly) truncated to
3794: .nf
3795:
3796: foo handle_foo();
3797:
3798: .fi
3799: .I flex
3800: does not truncate the action. Actions that are not enclosed in
3801: braces are simply terminated at the end of the line.
3802: .SH DIAGNOSTICS
3803: .PP
3804: .I warning, rule cannot be matched
3805: indicates that the given rule
3806: cannot be matched because it follows other rules that will
3807: always match the same text as it. For
3808: example, in the following "foo" cannot be matched because it comes after
3809: an identifier "catch-all" rule:
3810: .nf
3811:
3812: [a-z]+ got_identifier();
3813: foo got_foo();
3814:
3815: .fi
3816: Using
3817: .B REJECT
3818: in a scanner suppresses this warning.
3819: .PP
3820: .I warning,
3821: .B \-s
3822: .I
3823: option given but default rule can be matched
3824: means that it is possible (perhaps only in a particular start condition)
3825: that the default rule (match any single character) is the only one
3826: that will match a particular input. Since
3827: .B \-s
3828: was given, presumably this is not intended.
3829: .PP
3830: .I reject_used_but_not_detected undefined
3831: or
3832: .I yymore_used_but_not_detected undefined -
3833: These errors can occur at compile time. They indicate that the
3834: scanner uses
3835: .B REJECT
3836: or
3837: .B yymore()
3838: but that
3839: .I flex
3840: failed to notice the fact, meaning that
3841: .I flex
3842: scanned the first two sections looking for occurrences of these actions
1.10 deraadt 3843: and failed to find any, but somehow you snuck some in (via an #include
1.1 deraadt 3844: file, for example). Use
3845: .B %option reject
3846: or
3847: .B %option yymore
3848: to indicate to flex that you really do use these features.
3849: .PP
3850: .I flex scanner jammed -
3851: a scanner compiled with
3852: .B \-s
3853: has encountered an input string which wasn't matched by
3854: any of its rules. This error can also occur due to internal problems.
3855: .PP
3856: .I token too large, exceeds YYLMAX -
3857: your scanner uses
3858: .B %array
3859: and one of its rules matched a string longer than the
3860: .B YYLMAX
3861: constant (8K bytes by default). You can increase the value by
3862: #define'ing
3863: .B YYLMAX
3864: in the definitions section of your
3865: .I flex
3866: input.
3867: .PP
3868: .I scanner requires \-8 flag to
3869: .I use the character 'x' -
3870: Your scanner specification includes recognizing the 8-bit character
3871: .I 'x'
3872: and you did not specify the \-8 flag, and your scanner defaulted to 7-bit
3873: because you used the
3874: .B \-Cf
3875: or
3876: .B \-CF
3877: table compression options. See the discussion of the
3878: .B \-7
3879: flag for details.
3880: .PP
3881: .I flex scanner push-back overflow -
3882: you used
3883: .B unput()
3884: to push back so much text that the scanner's buffer could not hold
3885: both the pushed-back text and the current token in
3886: .B yytext.
3887: Ideally the scanner should dynamically resize the buffer in this case, but at
3888: present it does not.
3889: .PP
3890: .I
3891: input buffer overflow, can't enlarge buffer because scanner uses REJECT -
3892: the scanner was working on matching an extremely large token and needed
3893: to expand the input buffer. This doesn't work with scanners that use
3894: .B
3895: REJECT.
3896: .PP
3897: .I
3898: fatal flex scanner internal error--end of buffer missed -
3899: This can occur in an scanner which is reentered after a long-jump
3900: has jumped out (or over) the scanner's activation frame. Before
3901: reentering the scanner, use:
3902: .nf
3903:
3904: yyrestart( yyin );
3905:
3906: .fi
3907: or, as noted above, switch to using the C++ scanner class.
3908: .PP
3909: .I too many start conditions in <> construct! -
3910: you listed more start conditions in a <> construct than exist (so
3911: you must have listed at least one of them twice).
3912: .SH FILES
3913: .TP
3914: .B \-lfl
3915: library with which scanners must be linked.
3916: .TP
3917: .I lex.yy.c
3918: generated scanner (called
3919: .I lexyy.c
3920: on some systems).
3921: .TP
3922: .I lex.yy.cc
3923: generated C++ scanner class, when using
3924: .B -+.
3925: .TP
1.5 deraadt 3926: .I <g++/FlexLexer.h>
1.1 deraadt 3927: header file defining the C++ scanner base class,
3928: .B FlexLexer,
3929: and its derived class,
3930: .B yyFlexLexer.
3931: .TP
3932: .I flex.skl
3933: skeleton scanner. This file is only used when building flex, not when
3934: flex executes.
3935: .TP
3936: .I lex.backup
3937: backing-up information for
3938: .B \-b
3939: flag (called
3940: .I lex.bck
3941: on some systems).
3942: .SH DEFICIENCIES / BUGS
3943: .PP
3944: Some trailing context
3945: patterns cannot be properly matched and generate
3946: warning messages ("dangerous trailing context"). These are
3947: patterns where the ending of the
3948: first part of the rule matches the beginning of the second
3949: part, such as "zx*/xy*", where the 'x*' matches the 'x' at
3950: the beginning of the trailing context. (Note that the POSIX draft
3951: states that the text matched by such patterns is undefined.)
3952: .PP
3953: For some trailing context rules, parts which are actually fixed-length are
1.3 deraadt 3954: not recognized as such, leading to the above mentioned performance loss.
1.1 deraadt 3955: In particular, parts using '|' or {n} (such as "foo{3}") are always
3956: considered variable-length.
3957: .PP
3958: Combining trailing context with the special '|' action can result in
3959: .I fixed
3960: trailing context being turned into the more expensive
3961: .I variable
3962: trailing context. For example, in the following:
3963: .nf
3964:
3965: %%
3966: abc |
3967: xyz/def
3968:
3969: .fi
3970: .PP
3971: Use of
3972: .B unput()
3973: invalidates yytext and yyleng, unless the
3974: .B %array
3975: directive
3976: or the
3977: .B \-l
3978: option has been used.
3979: .PP
3980: Pattern-matching of NUL's is substantially slower than matching other
3981: characters.
3982: .PP
3983: Dynamic resizing of the input buffer is slow, as it entails rescanning
3984: all the text matched so far by the current (generally huge) token.
3985: .PP
3986: Due to both buffering of input and read-ahead, you cannot intermix
3987: calls to <stdio.h> routines, such as, for example,
3988: .B getchar(),
3989: with
3990: .I flex
3991: rules and expect it to work. Call
3992: .B input()
3993: instead.
3994: .PP
3995: The total table entries listed by the
3996: .B \-v
3997: flag excludes the number of table entries needed to determine
3998: what rule has been matched. The number of entries is equal
3999: to the number of DFA states if the scanner does not use
4000: .B REJECT,
4001: and somewhat greater than the number of states if it does.
4002: .PP
4003: .B REJECT
4004: cannot be used with the
4005: .B \-f
4006: or
4007: .B \-F
4008: options.
4009: .PP
4010: The
4011: .I flex
4012: internal algorithms need documentation.
4013: .SH SEE ALSO
4014: .PP
4015: lex(1), yacc(1), sed(1), awk(1).
4016: .PP
4017: John Levine, Tony Mason, and Doug Brown,
4018: .I Lex & Yacc,
4019: O'Reilly and Associates. Be sure to get the 2nd edition.
4020: .PP
4021: M. E. Lesk and E. Schmidt,
4022: .I LEX \- Lexical Analyzer Generator
4023: .PP
4024: Alfred Aho, Ravi Sethi and Jeffrey Ullman,
4025: .I Compilers: Principles, Techniques and Tools,
4026: Addison-Wesley (1986). Describes the pattern-matching techniques used by
4027: .I flex
4028: (deterministic finite automata).
4029: .SH AUTHOR
4030: Vern Paxson, with the help of many ideas and much inspiration from
4031: Van Jacobson. Original version by Jef Poskanzer. The fast table
4032: representation is a partial implementation of a design done by Van
4033: Jacobson. The implementation was done by Kevin Gong and Vern Paxson.
4034: .PP
4035: Thanks to the many
4036: .I flex
4037: beta-testers, feedbackers, and contributors, especially Francois Pinard,
4038: Casey Leedom,
4039: Robert Abramovitz,
4040: Stan Adermann, Terry Allen, David Barker-Plummer, John Basrai,
4041: Neal Becker, Nelson H.F. Beebe, benson@odi.com,
4042: Karl Berry, Peter A. Bigot, Simon Blanchard,
4043: Keith Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick Christopher,
4044: Brian Clapper, J.T. Conklin,
4045: Jason Coughlin, Bill Cox, Nick Cropper, Dave Curtis, Scott David
1.11 deraadt 4046: Daniels, Chris G. Demetriou, Theo de Raadt,
1.1 deraadt 4047: Mike Donahue, Chuck Doucette, Tom Epperly, Leo Eskin,
4048: Chris Faylor, Chris Flatters, Jon Forrest, Jeffrey Friedl,
4049: Joe Gayda, Kaveh R. Ghazi, Wolfgang Glunz,
4050: Eric Goldman, Christopher M. Gould, Ulrich Grepel, Peer Griebel,
4051: Jan Hajic, Charles Hemphill, NORO Hideo,
4052: Jarkko Hietaniemi, Scott Hofmann,
4053: Jeff Honig, Dana Hudes, Eric Hughes, John Interrante,
4054: Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones,
4055: Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane,
4056: Amir Katz, ken@ken.hilco.com, Kevin B. Kenny,
4057: Steve Kirsch, Winfried Koenig, Marq Kole, Ronald Lamprecht,
4058: Greg Lee, Rohan Lenard, Craig Leres, John Levine, Steve Liddle,
4059: David Loffredo, Mike Long,
4060: Mohamed el Lozy, Brian Madsen, Malte, Joe Marshall,
4061: Bengt Martensson, Chris Metcalf,
4062: Luke Mewburn, Jim Meyering, R. Alexander Milowski, Erik Naggum,
4063: G.T. Nicol, Landon Noll, James Nordby, Marc Nozell,
4064: Richard Ohnemus, Karsten Pahnke,
4065: Sven Panne, Roland Pesch, Walter Pelissero, Gaumond
4066: Pierre, Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha,
4067: Frederic Raimbault, Pat Rankin, Rick Richardson,
4068: Kevin Rodgers, Kai Uwe Rommel, Jim Roskind, Alberto Santini,
4069: Andreas Scherer, Darrell Schiebel, Raf Schietekat,
4070: Doug Schmidt, Philippe Schnoebelen, Andreas Schwab,
4071: Larry Schwimmer, Alex Siegel, Eckehard Stolz, Jan-Erik Strvmquist,
4072: Mike Stump, Paul Stuart, Dave Tallman, Ian Lance Taylor,
4073: Chris Thewalt, Richard M. Timoney, Jodi Tsai,
4074: Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams, Ken
4075: Yap, Ron Zellar, Nathan Zelle, David Zuhn,
4076: and those whose names have slipped my marginal
4077: mail-archiving skills but whose contributions are appreciated all the
4078: same.
4079: .PP
4080: Thanks to Keith Bostic, Jon Forrest, Noah Friedman,
4081: John Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T.
4082: Nicol, Francois Pinard, Rich Salz, and Richard Stallman for help with various
4083: distribution headaches.
4084: .PP
4085: Thanks to Esmond Pitt and Earle Horton for 8-bit character support; to
4086: Benson Margulies and Fred Burke for C++ support; to Kent Williams and Tom
4087: Epperly for C++ class support; to Ove Ewerlid for support of NUL's; and to
4088: Eric Hughes for support of multiple buffers.
4089: .PP
4090: This work was primarily done when I was with the Real Time Systems Group
4091: at the Lawrence Berkeley Laboratory in Berkeley, CA. Many thanks to all there
4092: for the support I received.
4093: .PP
4094: Send comments to vern@ee.lbl.gov.