Annotation of src/usr.bin/lex/flex.1, Revision 1.12
1.12 ! jmc 1: .\" $OpenBSD: flex.1,v 1.11 2003/01/04 22:36:13 deraadt Exp $
! 2: .\"
! 3: .\" Copyright (c) 1990 The Regents of the University of California.
! 4: .\" All rights reserved.
1.2 deraadt 5: .\"
1.12 ! jmc 6: .\" This code is derived from software contributed to Berkeley by
! 7: .\" Vern Paxson.
! 8: .\"
! 9: .\" The United States Government has rights in this work pursuant
! 10: .\" to contract no. DE-AC03-76SF00098 between the United States
! 11: .\" Department of Energy and the University of California.
! 12: .\"
! 13: .\" Redistribution and use in source and binary forms, with or without
! 14: .\" modification, are permitted provided that: (1) source distributions
! 15: .\" retain this entire copyright notice and comment, and (2) distributions
! 16: .\" including binaries display the following acknowledgement: ``This product
! 17: .\" includes software developed by the University of California, Berkeley
! 18: .\" and its contributors'' in the documentation or other materials provided
! 19: .\" with the distribution and in all advertising materials mentioning
! 20: .\" features or use of this software. Neither the name of the University nor
! 21: .\" the names of its contributors may be used to endorse or promote products
! 22: .\" derived from this software without specific prior written permission.
! 23: .\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
! 24: .\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
! 25: .\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
! 26: .\"
1.1 deraadt 27: .TH FLEX 1 "April 1995" "Version 2.5"
28: .SH NAME
29: flex \- fast lexical analyzer generator
30: .SH SYNOPSIS
31: .B flex
32: .B [\-bcdfhilnpstvwBFILTV78+? \-C[aefFmr] \-ooutput \-Pprefix \-Sskeleton]
33: .B [\-\-help \-\-version]
34: .I [filename ...]
35: .SH OVERVIEW
36: This manual describes
37: .I flex,
38: a tool for generating programs that perform pattern-matching on text. The
39: manual includes both tutorial and reference sections:
40: .nf
41:
42: Description
43: a brief overview of the tool
44:
45: Some Simple Examples
46:
47: Format Of The Input File
48:
49: Patterns
50: the extended regular expressions used by flex
51:
52: How The Input Is Matched
53: the rules for determining what has been matched
54:
55: Actions
56: how to specify what to do when a pattern is matched
57:
58: The Generated Scanner
59: details regarding the scanner that flex produces;
60: how to control the input source
61:
62: Start Conditions
63: introducing context into your scanners, and
64: managing "mini-scanners"
65:
66: Multiple Input Buffers
67: how to manipulate multiple input sources; how to
68: scan from strings instead of files
69:
70: End-of-file Rules
71: special rules for matching the end of the input
72:
73: Miscellaneous Macros
74: a summary of macros available to the actions
75:
76: Values Available To The User
77: a summary of values available to the actions
78:
79: Interfacing With Yacc
80: connecting flex scanners together with yacc parsers
81:
82: Options
83: flex command-line options, and the "%option"
84: directive
85:
86: Performance Considerations
87: how to make your scanner go as fast as possible
88:
89: Generating C++ Scanners
90: the (experimental) facility for generating C++
91: scanner classes
92:
93: Incompatibilities With Lex And POSIX
94: how flex differs from AT&T lex and the POSIX lex
95: standard
96:
97: Diagnostics
98: those error messages produced by flex (or scanners
99: it generates) whose meanings might not be apparent
100:
101: Files
102: files used by flex
103:
104: Deficiencies / Bugs
105: known problems with flex
106:
107: See Also
108: other documentation, related tools
109:
110: Author
111: includes contact information
112:
113: .fi
114: .SH DESCRIPTION
115: .I flex
116: is a tool for generating
117: .I scanners:
1.9 millert 118: programs which recognize lexical patterns in text.
1.1 deraadt 119: .I flex
120: reads
121: the given input files, or its standard input if no file names are given,
122: for a description of a scanner to generate. The description is in
123: the form of pairs
124: of regular expressions and C code, called
125: .I rules. flex
126: generates as output a C source file,
127: .B lex.yy.c,
128: which defines a routine
129: .B yylex().
130: This file is compiled and linked with the
131: .B \-lfl
132: library to produce an executable. When the executable is run,
133: it analyzes its input for occurrences
134: of the regular expressions. Whenever it finds one, it executes
135: the corresponding C code.
136: .SH SOME SIMPLE EXAMPLES
137: .PP
138: First some simple examples to get the flavor of how one uses
139: .I flex.
140: The following
141: .I flex
142: input specifies a scanner which whenever it encounters the string
143: "username" will replace it with the user's login name:
144: .nf
145:
146: %%
147: username printf( "%s", getlogin() );
148:
149: .fi
150: By default, any text not matched by a
151: .I flex
152: scanner
153: is copied to the output, so the net effect of this scanner is
154: to copy its input file to its output with each occurrence
155: of "username" expanded.
156: In this input, there is just one rule. "username" is the
157: .I pattern
158: and the "printf" is the
159: .I action.
160: The "%%" marks the beginning of the rules.
161: .PP
162: Here's another simple example:
163: .nf
164:
165: int num_lines = 0, num_chars = 0;
166:
167: %%
168: \\n ++num_lines; ++num_chars;
169: . ++num_chars;
170:
171: %%
172: main()
173: {
174: yylex();
175: printf( "# of lines = %d, # of chars = %d\\n",
176: num_lines, num_chars );
177: }
178:
179: .fi
180: This scanner counts the number of characters and the number
181: of lines in its input (it produces no output other than the
182: final report on the counts). The first line
183: declares two globals, "num_lines" and "num_chars", which are accessible
184: both inside
185: .B yylex()
186: and in the
187: .B main()
188: routine declared after the second "%%". There are two rules, one
189: which matches a newline ("\\n") and increments both the line count and
190: the character count, and one which matches any character other than
191: a newline (indicated by the "." regular expression).
192: .PP
193: A somewhat more complicated example:
194: .nf
195:
196: /* scanner for a toy Pascal-like language */
197:
198: %{
199: /* need this for the call to atof() below */
200: #include <math.h>
201: %}
202:
203: DIGIT [0-9]
204: ID [a-z][a-z0-9]*
205:
206: %%
207:
208: {DIGIT}+ {
209: printf( "An integer: %s (%d)\\n", yytext,
210: atoi( yytext ) );
211: }
212:
213: {DIGIT}+"."{DIGIT}* {
214: printf( "A float: %s (%g)\\n", yytext,
215: atof( yytext ) );
216: }
217:
218: if|then|begin|end|procedure|function {
219: printf( "A keyword: %s\\n", yytext );
220: }
221:
222: {ID} printf( "An identifier: %s\\n", yytext );
223:
224: "+"|"-"|"*"|"/" printf( "An operator: %s\\n", yytext );
225:
226: "{"[^}\\n]*"}" /* eat up one-line comments */
227:
228: [ \\t\\n]+ /* eat up whitespace */
229:
230: . printf( "Unrecognized character: %s\\n", yytext );
231:
232: %%
233:
234: main( argc, argv )
235: int argc;
236: char **argv;
237: {
238: ++argv, --argc; /* skip over program name */
239: if ( argc > 0 )
240: yyin = fopen( argv[0], "r" );
241: else
242: yyin = stdin;
1.7 aaron 243:
1.1 deraadt 244: yylex();
245: }
246:
247: .fi
248: This is the beginnings of a simple scanner for a language like
249: Pascal. It identifies different types of
250: .I tokens
251: and reports on what it has seen.
252: .PP
253: The details of this example will be explained in the following
254: sections.
255: .SH FORMAT OF THE INPUT FILE
256: The
257: .I flex
258: input file consists of three sections, separated by a line with just
259: .B %%
260: in it:
261: .nf
262:
263: definitions
264: %%
265: rules
266: %%
267: user code
268:
269: .fi
270: The
271: .I definitions
272: section contains declarations of simple
273: .I name
274: definitions to simplify the scanner specification, and declarations of
275: .I start conditions,
276: which are explained in a later section.
277: .PP
278: Name definitions have the form:
279: .nf
280:
281: name definition
282:
283: .fi
284: The "name" is a word beginning with a letter or an underscore ('_')
285: followed by zero or more letters, digits, '_', or '-' (dash).
1.8 aaron 286: The definition is taken to begin at the first non-whitespace character
1.1 deraadt 287: following the name and continuing to the end of the line.
288: The definition can subsequently be referred to using "{name}", which
289: will expand to "(definition)". For example,
290: .nf
291:
292: DIGIT [0-9]
293: ID [a-z][a-z0-9]*
294:
295: .fi
296: defines "DIGIT" to be a regular expression which matches a
297: single digit, and
298: "ID" to be a regular expression which matches a letter
299: followed by zero-or-more letters-or-digits.
300: A subsequent reference to
301: .nf
302:
303: {DIGIT}+"."{DIGIT}*
304:
305: .fi
306: is identical to
307: .nf
308:
309: ([0-9])+"."([0-9])*
310:
311: .fi
312: and matches one-or-more digits followed by a '.' followed
313: by zero-or-more digits.
314: .PP
315: The
316: .I rules
317: section of the
318: .I flex
319: input contains a series of rules of the form:
320: .nf
321:
322: pattern action
323:
324: .fi
325: where the pattern must be unindented and the action must begin
326: on the same line.
327: .PP
328: See below for a further description of patterns and actions.
329: .PP
330: Finally, the user code section is simply copied to
331: .B lex.yy.c
332: verbatim.
333: It is used for companion routines which call or are called
334: by the scanner. The presence of this section is optional;
335: if it is missing, the second
336: .B %%
337: in the input file may be skipped, too.
338: .PP
339: In the definitions and rules sections, any
340: .I indented
341: text or text enclosed in
342: .B %{
343: and
344: .B %}
345: is copied verbatim to the output (with the %{}'s removed).
346: The %{}'s must appear unindented on lines by themselves.
347: .PP
348: In the rules section,
349: any indented or %{} text appearing before the
350: first rule may be used to declare variables
351: which are local to the scanning routine and (after the declarations)
352: code which is to be executed whenever the scanning routine is entered.
353: Other indented or %{} text in the rule section is still copied to the output,
354: but its meaning is not well-defined and it may well cause compile-time
355: errors (this feature is present for
356: .I POSIX
357: compliance; see below for other such features).
358: .PP
359: In the definitions section (but not in the rules section),
360: an unindented comment (i.e., a line
361: beginning with "/*") is also copied verbatim to the output up
362: to the next "*/".
363: .SH PATTERNS
364: The patterns in the input are written using an extended set of regular
365: expressions. These are:
366: .nf
367:
368: x match the character 'x'
369: . any character (byte) except newline
370: [xyz] a "character class"; in this case, the pattern
371: matches either an 'x', a 'y', or a 'z'
372: [abj-oZ] a "character class" with a range in it; matches
373: an 'a', a 'b', any letter from 'j' through 'o',
374: or a 'Z'
375: [^A-Z] a "negated character class", i.e., any character
376: but those in the class. In this case, any
377: character EXCEPT an uppercase letter.
378: [^A-Z\\n] any character EXCEPT an uppercase letter or
379: a newline
380: r* zero or more r's, where r is any regular expression
381: r+ one or more r's
382: r? zero or one r's (that is, "an optional r")
383: r{2,5} anywhere from two to five r's
384: r{2,} two or more r's
385: r{4} exactly 4 r's
386: {name} the expansion of the "name" definition
387: (see above)
388: "[xyz]\\"foo"
389: the literal string: [xyz]"foo
390: \\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
391: then the ANSI-C interpretation of \\x.
392: Otherwise, a literal 'X' (used to escape
393: operators such as '*')
394: \\0 a NUL character (ASCII code 0)
395: \\123 the character with octal value 123
396: \\x2a the character with hexadecimal value 2a
397: (r) match an r; parentheses are used to override
398: precedence (see below)
399:
400:
401: rs the regular expression r followed by the
402: regular expression s; called "concatenation"
403:
404:
405: r|s either an r or an s
406:
407:
408: r/s an r but only if it is followed by an s. The
409: text matched by s is included when determining
410: whether this rule is the "longest match",
411: but is then returned to the input before
412: the action is executed. So the action only
413: sees the text matched by r. This type
414: of pattern is called trailing context".
415: (There are some combinations of r/s that flex
416: cannot match correctly; see notes in the
417: Deficiencies / Bugs section below regarding
418: "dangerous trailing context".)
419: ^r an r, but only at the beginning of a line (i.e.,
1.10 deraadt 420: just starting to scan, or right after a
1.1 deraadt 421: newline has been scanned).
422: r$ an r, but only at the end of a line (i.e., just
423: before a newline). Equivalent to "r/\\n".
424:
425: Note that flex's notion of "newline" is exactly
426: whatever the C compiler used to compile flex
427: interprets '\\n' as; in particular, on some DOS
428: systems you must either filter out \\r's in the
429: input yourself, or explicitly use r/\\r\\n for "r$".
430:
431:
432: <s>r an r, but only in start condition s (see
433: below for discussion of start conditions)
434: <s1,s2,s3>r
435: same, but in any of start conditions s1,
436: s2, or s3
437: <*>r an r in any start condition, even an exclusive one.
438:
439:
440: <<EOF>> an end-of-file
441: <s1,s2><<EOF>>
442: an end-of-file when in start condition s1 or s2
443:
444: .fi
445: Note that inside of a character class, all regular expression operators
446: lose their special meaning except escape ('\\') and the character class
447: operators, '-', ']', and, at the beginning of the class, '^'.
448: .PP
449: The regular expressions listed above are grouped according to
450: precedence, from highest precedence at the top to lowest at the bottom.
451: Those grouped together have equal precedence. For example,
452: .nf
453:
454: foo|bar*
455:
456: .fi
457: is the same as
458: .nf
459:
460: (foo)|(ba(r*))
461:
462: .fi
463: since the '*' operator has higher precedence than concatenation,
464: and concatenation higher than alternation ('|'). This pattern
465: therefore matches
466: .I either
467: the string "foo"
468: .I or
469: the string "ba" followed by zero-or-more r's.
470: To match "foo" or zero-or-more "bar"'s, use:
471: .nf
472:
473: foo|(bar)*
474:
475: .fi
476: and to match zero-or-more "foo"'s-or-"bar"'s:
477: .nf
478:
479: (foo|bar)*
480:
481: .fi
482: .PP
483: In addition to characters and ranges of characters, character classes
484: can also contain character class
485: .I expressions.
486: These are expressions enclosed inside
487: .B [:
488: and
489: .B :]
490: delimiters (which themselves must appear between the '[' and ']' of the
491: character class; other elements may occur inside the character class, too).
492: The valid expressions are:
493: .nf
494:
495: [:alnum:] [:alpha:] [:blank:]
496: [:cntrl:] [:digit:] [:graph:]
497: [:lower:] [:print:] [:punct:]
498: [:space:] [:upper:] [:xdigit:]
499:
500: .fi
501: These expressions all designate a set of characters equivalent to
502: the corresponding standard C
503: .B isXXX
504: function. For example,
505: .B [:alnum:]
506: designates those characters for which
507: .B isalnum()
508: returns true - i.e., any alphabetic or numeric.
509: Some systems don't provide
510: .B isblank(),
511: so flex defines
512: .B [:blank:]
513: as a blank or a tab.
514: .PP
515: For example, the following character classes are all equivalent:
516: .nf
517:
518: [[:alnum:]]
1.4 deraadt 519: [[:alpha:][:digit:]]
1.1 deraadt 520: [[:alpha:]0-9]
521: [a-zA-Z0-9]
522:
523: .fi
524: If your scanner is case-insensitive (the
525: .B \-i
526: flag), then
527: .B [:upper:]
528: and
529: .B [:lower:]
530: are equivalent to
531: .B [:alpha:].
532: .PP
533: Some notes on patterns:
534: .IP -
535: A negated character class such as the example "[^A-Z]"
536: above
537: .I will match a newline
538: unless "\\n" (or an equivalent escape sequence) is one of the
539: characters explicitly present in the negated character class
540: (e.g., "[^A-Z\\n]"). This is unlike how many other regular
541: expression tools treat negated character classes, but unfortunately
542: the inconsistency is historically entrenched.
543: Matching newlines means that a pattern like [^"]* can match the entire
544: input unless there's another quote in the input.
545: .IP -
546: A rule can have at most one instance of trailing context (the '/' operator
547: or the '$' operator). The start condition, '^', and "<<EOF>>" patterns
548: can only occur at the beginning of a pattern, and, as well as with '/' and '$',
549: cannot be grouped inside parentheses. A '^' which does not occur at
550: the beginning of a rule or a '$' which does not occur at the end of
551: a rule loses its special properties and is treated as a normal character.
552: .IP
553: The following are illegal:
554: .nf
555:
556: foo/bar$
557: <sc1>foo<sc2>bar
558:
559: .fi
560: Note that the first of these, can be written "foo/bar\\n".
561: .IP
562: The following will result in '$' or '^' being treated as a normal character:
563: .nf
564:
565: foo|(bar$)
566: foo|^bar
567:
568: .fi
569: If what's wanted is a "foo" or a bar-followed-by-a-newline, the following
570: could be used (the special '|' action is explained below):
571: .nf
572:
573: foo |
574: bar$ /* action goes here */
575:
576: .fi
577: A similar trick will work for matching a foo or a
578: bar-at-the-beginning-of-a-line.
579: .SH HOW THE INPUT IS MATCHED
580: When the generated scanner is run, it analyzes its input looking
581: for strings which match any of its patterns. If it finds more than
582: one match, it takes the one matching the most text (for trailing
583: context rules, this includes the length of the trailing part, even
584: though it will then be returned to the input). If it finds two
585: or more matches of the same length, the
586: rule listed first in the
587: .I flex
588: input file is chosen.
589: .PP
590: Once the match is determined, the text corresponding to the match
591: (called the
592: .I token)
593: is made available in the global character pointer
594: .B yytext,
595: and its length in the global integer
596: .B yyleng.
597: The
598: .I action
599: corresponding to the matched pattern is then executed (a more
600: detailed description of actions follows), and then the remaining
601: input is scanned for another match.
602: .PP
603: If no match is found, then the
604: .I default rule
605: is executed: the next character in the input is considered matched and
606: copied to the standard output. Thus, the simplest legal
607: .I flex
608: input is:
609: .nf
610:
611: %%
612:
613: .fi
614: which generates a scanner that simply copies its input (one character
615: at a time) to its output.
616: .PP
617: Note that
618: .B yytext
619: can be defined in two different ways: either as a character
620: .I pointer
621: or as a character
622: .I array.
623: You can control which definition
624: .I flex
625: uses by including one of the special directives
626: .B %pointer
627: or
628: .B %array
629: in the first (definitions) section of your flex input. The default is
630: .B %pointer,
631: unless you use the
632: .B -l
633: lex compatibility option, in which case
634: .B yytext
635: will be an array.
636: The advantage of using
637: .B %pointer
638: is substantially faster scanning and no buffer overflow when matching
639: very large tokens (unless you run out of dynamic memory). The disadvantage
640: is that you are restricted in how your actions can modify
641: .B yytext
642: (see the next section), and calls to the
643: .B unput()
1.10 deraadt 644: function destroy the present contents of
1.1 deraadt 645: .B yytext,
646: which can be a considerable porting headache when moving between different
647: .I lex
648: versions.
649: .PP
650: The advantage of
651: .B %array
652: is that you can then modify
653: .B yytext
654: to your heart's content, and calls to
655: .B unput()
656: do not destroy
657: .B yytext
658: (see below). Furthermore, existing
659: .I lex
660: programs sometimes access
661: .B yytext
662: externally using declarations of the form:
663: .nf
664: extern char yytext[];
665: .fi
666: This definition is erroneous when used with
667: .B %pointer,
668: but correct for
669: .B %array.
670: .PP
671: .B %array
672: defines
673: .B yytext
674: to be an array of
675: .B YYLMAX
676: characters, which defaults to a fairly large value. You can change
677: the size by simply #define'ing
678: .B YYLMAX
679: to a different value in the first section of your
680: .I flex
681: input. As mentioned above, with
682: .B %pointer
683: yytext grows dynamically to accommodate large tokens. While this means your
684: .B %pointer
685: scanner can accommodate very large tokens (such as matching entire blocks
686: of comments), bear in mind that each time the scanner must resize
687: .B yytext
688: it also must rescan the entire token from the beginning, so matching such
689: tokens can prove slow.
690: .B yytext
691: presently does
692: .I not
693: dynamically grow if a call to
694: .B unput()
695: results in too much text being pushed back; instead, a run-time error results.
696: .PP
697: Also note that you cannot use
698: .B %array
699: with C++ scanner classes
700: (the
701: .B c++
702: option; see below).
703: .SH ACTIONS
704: Each pattern in a rule has a corresponding action, which can be any
705: arbitrary C statement. The pattern ends at the first non-escaped
706: whitespace character; the remainder of the line is its action. If the
707: action is empty, then when the pattern is matched the input token
708: is simply discarded. For example, here is the specification for a program
709: which deletes all occurrences of "zap me" from its input:
710: .nf
711:
712: %%
713: "zap me"
714:
715: .fi
716: (It will copy all other characters in the input to the output since
717: they will be matched by the default rule.)
718: .PP
719: Here is a program which compresses multiple blanks and tabs down to
720: a single blank, and throws away whitespace found at the end of a line:
721: .nf
722:
723: %%
724: [ \\t]+ putchar( ' ' );
725: [ \\t]+$ /* ignore this token */
726:
727: .fi
728: .PP
729: If the action contains a '{', then the action spans till the balancing '}'
730: is found, and the action may cross multiple lines.
1.7 aaron 731: .I flex
1.1 deraadt 732: knows about C strings and comments and won't be fooled by braces found
733: within them, but also allows actions to begin with
734: .B %{
735: and will consider the action to be all the text up to the next
736: .B %}
737: (regardless of ordinary braces inside the action).
738: .PP
739: An action consisting solely of a vertical bar ('|') means "same as
740: the action for the next rule." See below for an illustration.
741: .PP
742: Actions can include arbitrary C code, including
743: .B return
744: statements to return a value to whatever routine called
745: .B yylex().
746: Each time
747: .B yylex()
748: is called it continues processing tokens from where it last left
749: off until it either reaches
750: the end of the file or executes a return.
751: .PP
752: Actions are free to modify
753: .B yytext
754: except for lengthening it (adding
755: characters to its end--these will overwrite later characters in the
756: input stream). This however does not apply when using
757: .B %array
758: (see above); in that case,
759: .B yytext
760: may be freely modified in any way.
761: .PP
762: Actions are free to modify
763: .B yyleng
764: except they should not do so if the action also includes use of
765: .B yymore()
766: (see below).
767: .PP
768: There are a number of special directives which can be included within
769: an action:
770: .IP -
771: .B ECHO
772: copies yytext to the scanner's output.
773: .IP -
774: .B BEGIN
775: followed by the name of a start condition places the scanner in the
776: corresponding start condition (see below).
777: .IP -
778: .B REJECT
779: directs the scanner to proceed on to the "second best" rule which matched the
780: input (or a prefix of the input). The rule is chosen as described
781: above in "How the Input is Matched", and
782: .B yytext
783: and
784: .B yyleng
785: set up appropriately.
786: It may either be one which matched as much text
787: as the originally chosen rule but came later in the
788: .I flex
789: input file, or one which matched less text.
790: For example, the following will both count the
791: words in the input and call the routine special() whenever "frob" is seen:
792: .nf
793:
794: int word_count = 0;
795: %%
796:
797: frob special(); REJECT;
798: [^ \\t\\n]+ ++word_count;
799:
800: .fi
801: Without the
802: .B REJECT,
803: any "frob"'s in the input would not be counted as words, since the
804: scanner normally executes only one action per token.
805: Multiple
806: .B REJECT's
807: are allowed, each one finding the next best choice to the currently
808: active rule. For example, when the following scanner scans the token
809: "abcd", it will write "abcdabcaba" to the output:
810: .nf
811:
812: %%
813: a |
814: ab |
815: abc |
816: abcd ECHO; REJECT;
817: .|\\n /* eat up any unmatched character */
818:
819: .fi
820: (The first three rules share the fourth's action since they use
821: the special '|' action.)
822: .B REJECT
823: is a particularly expensive feature in terms of scanner performance;
824: if it is used in
825: .I any
826: of the scanner's actions it will slow down
827: .I all
828: of the scanner's matching. Furthermore,
829: .B REJECT
830: cannot be used with the
831: .I -Cf
832: or
833: .I -CF
834: options (see below).
835: .IP
836: Note also that unlike the other special actions,
837: .B REJECT
838: is a
839: .I branch;
840: code immediately following it in the action will
841: .I not
842: be executed.
843: .IP -
844: .B yymore()
845: tells the scanner that the next time it matches a rule, the corresponding
846: token should be
847: .I appended
848: onto the current value of
849: .B yytext
850: rather than replacing it. For example, given the input "mega-kludge"
851: the following will write "mega-mega-kludge" to the output:
852: .nf
853:
854: %%
855: mega- ECHO; yymore();
856: kludge ECHO;
857:
858: .fi
859: First "mega-" is matched and echoed to the output. Then "kludge"
860: is matched, but the previous "mega-" is still hanging around at the
861: beginning of
862: .B yytext
863: so the
864: .B ECHO
865: for the "kludge" rule will actually write "mega-kludge".
866: .PP
867: Two notes regarding use of
868: .B yymore().
869: First,
870: .B yymore()
871: depends on the value of
872: .I yyleng
873: correctly reflecting the size of the current token, so you must not
874: modify
875: .I yyleng
876: if you are using
877: .B yymore().
878: Second, the presence of
879: .B yymore()
880: in the scanner's action entails a minor performance penalty in the
881: scanner's matching speed.
882: .IP -
883: .B yyless(n)
884: returns all but the first
885: .I n
886: characters of the current token back to the input stream, where they
887: will be rescanned when the scanner looks for the next match.
888: .B yytext
889: and
890: .B yyleng
891: are adjusted appropriately (e.g.,
892: .B yyleng
893: will now be equal to
894: .I n
895: ). For example, on the input "foobar" the following will write out
896: "foobarbar":
897: .nf
898:
899: %%
900: foobar ECHO; yyless(3);
901: [a-z]+ ECHO;
902:
903: .fi
904: An argument of 0 to
905: .B yyless
906: will cause the entire current input string to be scanned again. Unless you've
907: changed how the scanner will subsequently process its input (using
908: .B BEGIN,
909: for example), this will result in an endless loop.
910: .PP
911: Note that
912: .B yyless
913: is a macro and can only be used in the flex input file, not from
914: other source files.
915: .IP -
916: .B unput(c)
917: puts the character
918: .I c
919: back onto the input stream. It will be the next character scanned.
920: The following action will take the current token and cause it
921: to be rescanned enclosed in parentheses.
922: .nf
923:
924: {
925: int i;
926: /* Copy yytext because unput() trashes yytext */
927: char *yycopy = strdup( yytext );
928: unput( ')' );
929: for ( i = yyleng - 1; i >= 0; --i )
930: unput( yycopy[i] );
931: unput( '(' );
932: free( yycopy );
933: }
934:
935: .fi
936: Note that since each
937: .B unput()
938: puts the given character back at the
939: .I beginning
940: of the input stream, pushing back strings must be done back-to-front.
941: .PP
942: An important potential problem when using
943: .B unput()
944: is that if you are using
945: .B %pointer
946: (the default), a call to
947: .B unput()
948: .I destroys
949: the contents of
950: .I yytext,
951: starting with its rightmost character and devouring one character to
952: the left with each call. If you need the value of yytext preserved
953: after a call to
954: .B unput()
955: (as in the above example),
956: you must either first copy it elsewhere, or build your scanner using
957: .B %array
958: instead (see How The Input Is Matched).
959: .PP
960: Finally, note that you cannot put back
961: .B EOF
962: to attempt to mark the input stream with an end-of-file.
963: .IP -
964: .B input()
965: reads the next character from the input stream. For example,
966: the following is one way to eat up C comments:
967: .nf
968:
969: %%
970: "/*" {
971: register int c;
972:
973: for ( ; ; )
974: {
975: while ( (c = input()) != '*' &&
976: c != EOF )
977: ; /* eat up text of comment */
978:
979: if ( c == '*' )
980: {
981: while ( (c = input()) == '*' )
982: ;
983: if ( c == '/' )
984: break; /* found the end */
985: }
986:
987: if ( c == EOF )
988: {
989: error( "EOF in comment" );
990: break;
991: }
992: }
993: }
994:
995: .fi
996: (Note that if the scanner is compiled using
997: .B C++,
998: then
999: .B input()
1000: is instead referred to as
1001: .B yyinput(),
1002: in order to avoid a name clash with the
1003: .B C++
1004: stream by the name of
1005: .I input.)
1006: .IP -
1007: .B YY_FLUSH_BUFFER
1008: flushes the scanner's internal buffer
1009: so that the next time the scanner attempts to match a token, it will
1010: first refill the buffer using
1011: .B YY_INPUT
1012: (see The Generated Scanner, below). This action is a special case
1013: of the more general
1014: .B yy_flush_buffer()
1015: function, described below in the section Multiple Input Buffers.
1016: .IP -
1017: .B yyterminate()
1018: can be used in lieu of a return statement in an action. It terminates
1019: the scanner and returns a 0 to the scanner's caller, indicating "all done".
1020: By default,
1021: .B yyterminate()
1022: is also called when an end-of-file is encountered. It is a macro and
1023: may be redefined.
1024: .SH THE GENERATED SCANNER
1025: The output of
1026: .I flex
1027: is the file
1028: .B lex.yy.c,
1029: which contains the scanning routine
1030: .B yylex(),
1031: a number of tables used by it for matching tokens, and a number
1032: of auxiliary routines and macros. By default,
1033: .B yylex()
1034: is declared as follows:
1035: .nf
1036:
1037: int yylex()
1038: {
1039: ... various definitions and the actions in here ...
1040: }
1041:
1042: .fi
1043: (If your environment supports function prototypes, then it will
1044: be "int yylex( void )".) This definition may be changed by defining
1045: the "YY_DECL" macro. For example, you could use:
1046: .nf
1047:
1048: #define YY_DECL float lexscan( a, b ) float a, b;
1049:
1050: .fi
1051: to give the scanning routine the name
1052: .I lexscan,
1053: returning a float, and taking two floats as arguments. Note that
1054: if you give arguments to the scanning routine using a
1055: K&R-style/non-prototyped function declaration, you must terminate
1056: the definition with a semi-colon (;).
1057: .PP
1058: Whenever
1059: .B yylex()
1060: is called, it scans tokens from the global input file
1061: .I yyin
1062: (which defaults to stdin). It continues until it either reaches
1063: an end-of-file (at which point it returns the value 0) or
1064: one of its actions executes a
1065: .I return
1066: statement.
1067: .PP
1068: If the scanner reaches an end-of-file, subsequent calls are undefined
1069: unless either
1070: .I yyin
1071: is pointed at a new input file (in which case scanning continues from
1072: that file), or
1073: .B yyrestart()
1074: is called.
1075: .B yyrestart()
1076: takes one argument, a
1077: .B FILE *
1078: pointer (which can be nil, if you've set up
1079: .B YY_INPUT
1080: to scan from a source other than
1081: .I yyin),
1082: and initializes
1083: .I yyin
1084: for scanning from that file. Essentially there is no difference between
1085: just assigning
1086: .I yyin
1087: to a new input file or using
1088: .B yyrestart()
1089: to do so; the latter is available for compatibility with previous versions
1090: of
1091: .I flex,
1092: and because it can be used to switch input files in the middle of scanning.
1093: It can also be used to throw away the current input buffer, by calling
1094: it with an argument of
1095: .I yyin;
1096: but better is to use
1097: .B YY_FLUSH_BUFFER
1098: (see above).
1099: Note that
1100: .B yyrestart()
1101: does
1102: .I not
1103: reset the start condition to
1104: .B INITIAL
1105: (see Start Conditions, below).
1106: .PP
1107: If
1108: .B yylex()
1109: stops scanning due to executing a
1110: .I return
1111: statement in one of the actions, the scanner may then be called again and it
1112: will resume scanning where it left off.
1113: .PP
1114: By default (and for purposes of efficiency), the scanner uses
1115: block-reads rather than simple
1116: .I getc()
1117: calls to read characters from
1118: .I yyin.
1119: The nature of how it gets its input can be controlled by defining the
1120: .B YY_INPUT
1121: macro.
1122: YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its
1123: action is to place up to
1124: .I max_size
1125: characters in the character array
1126: .I buf
1127: and return in the integer variable
1128: .I result
1129: either the
1130: number of characters read or the constant YY_NULL (0 on Unix systems)
1131: to indicate EOF. The default YY_INPUT reads from the
1132: global file-pointer "yyin".
1133: .PP
1134: A sample definition of YY_INPUT (in the definitions
1135: section of the input file):
1136: .nf
1137:
1138: %{
1139: #define YY_INPUT(buf,result,max_size) \\
1140: { \\
1141: int c = getchar(); \\
1142: result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\
1143: }
1144: %}
1145:
1146: .fi
1147: This definition will change the input processing to occur
1148: one character at a time.
1149: .PP
1150: When the scanner receives an end-of-file indication from YY_INPUT,
1151: it then checks the
1152: .B yywrap()
1153: function. If
1154: .B yywrap()
1155: returns false (zero), then it is assumed that the
1156: function has gone ahead and set up
1157: .I yyin
1158: to point to another input file, and scanning continues. If it returns
1159: true (non-zero), then the scanner terminates, returning 0 to its
1160: caller. Note that in either case, the start condition remains unchanged;
1161: it does
1162: .I not
1163: revert to
1164: .B INITIAL.
1165: .PP
1166: If you do not supply your own version of
1167: .B yywrap(),
1168: then you must either use
1169: .B %option noyywrap
1170: (in which case the scanner behaves as though
1171: .B yywrap()
1172: returned 1), or you must link with
1173: .B \-lfl
1174: to obtain the default version of the routine, which always returns 1.
1175: .PP
1176: Three routines are available for scanning from in-memory buffers rather
1177: than files:
1178: .B yy_scan_string(), yy_scan_bytes(),
1179: and
1180: .B yy_scan_buffer().
1181: See the discussion of them below in the section Multiple Input Buffers.
1182: .PP
1183: The scanner writes its
1184: .B ECHO
1185: output to the
1186: .I yyout
1187: global (default, stdout), which may be redefined by the user simply
1188: by assigning it to some other
1189: .B FILE
1190: pointer.
1191: .SH START CONDITIONS
1192: .I flex
1193: provides a mechanism for conditionally activating rules. Any rule
1194: whose pattern is prefixed with "<sc>" will only be active when
1195: the scanner is in the start condition named "sc". For example,
1196: .nf
1197:
1198: <STRING>[^"]* { /* eat up the string body ... */
1199: ...
1200: }
1201:
1202: .fi
1203: will be active only when the scanner is in the "STRING" start
1204: condition, and
1205: .nf
1206:
1207: <INITIAL,STRING,QUOTE>\\. { /* handle an escape ... */
1208: ...
1209: }
1210:
1211: .fi
1212: will be active only when the current start condition is
1213: either "INITIAL", "STRING", or "QUOTE".
1214: .PP
1215: Start conditions
1216: are declared in the definitions (first) section of the input
1217: using unindented lines beginning with either
1218: .B %s
1219: or
1220: .B %x
1221: followed by a list of names.
1222: The former declares
1223: .I inclusive
1224: start conditions, the latter
1225: .I exclusive
1226: start conditions. A start condition is activated using the
1227: .B BEGIN
1228: action. Until the next
1229: .B BEGIN
1230: action is executed, rules with the given start
1231: condition will be active and
1232: rules with other start conditions will be inactive.
1233: If the start condition is
1234: .I inclusive,
1235: then rules with no start conditions at all will also be active.
1236: If it is
1237: .I exclusive,
1238: then
1239: .I only
1240: rules qualified with the start condition will be active.
1241: A set of rules contingent on the same exclusive start condition
1242: describe a scanner which is independent of any of the other rules in the
1243: .I flex
1244: input. Because of this,
1245: exclusive start conditions make it easy to specify "mini-scanners"
1246: which scan portions of the input that are syntactically different
1247: from the rest (e.g., comments).
1248: .PP
1249: If the distinction between inclusive and exclusive start conditions
1250: is still a little vague, here's a simple example illustrating the
1251: connection between the two. The set of rules:
1252: .nf
1253:
1254: %s example
1255: %%
1256:
1257: <example>foo do_something();
1258:
1259: bar something_else();
1260:
1261: .fi
1262: is equivalent to
1263: .nf
1264:
1265: %x example
1266: %%
1267:
1268: <example>foo do_something();
1269:
1270: <INITIAL,example>bar something_else();
1271:
1272: .fi
1273: Without the
1274: .B <INITIAL,example>
1275: qualifier, the
1276: .I bar
1277: pattern in the second example wouldn't be active (i.e., couldn't match)
1278: when in start condition
1279: .B example.
1280: If we just used
1281: .B <example>
1282: to qualify
1283: .I bar,
1284: though, then it would only be active in
1285: .B example
1286: and not in
1287: .B INITIAL,
1288: while in the first example it's active in both, because in the first
1289: example the
1290: .B example
1.10 deraadt 1291: start condition is an
1.1 deraadt 1292: .I inclusive
1293: .B (%s)
1294: start condition.
1295: .PP
1296: Also note that the special start-condition specifier
1297: .B <*>
1298: matches every start condition. Thus, the above example could also
1299: have been written;
1300: .nf
1301:
1302: %x example
1303: %%
1304:
1305: <example>foo do_something();
1306:
1307: <*>bar something_else();
1308:
1309: .fi
1310: .PP
1311: The default rule (to
1312: .B ECHO
1313: any unmatched character) remains active in start conditions. It
1314: is equivalent to:
1315: .nf
1316:
1317: <*>.|\\n ECHO;
1318:
1319: .fi
1320: .PP
1321: .B BEGIN(0)
1322: returns to the original state where only the rules with
1323: no start conditions are active. This state can also be
1324: referred to as the start-condition "INITIAL", so
1325: .B BEGIN(INITIAL)
1326: is equivalent to
1327: .B BEGIN(0).
1328: (The parentheses around the start condition name are not required but
1329: are considered good style.)
1330: .PP
1331: .B BEGIN
1332: actions can also be given as indented code at the beginning
1333: of the rules section. For example, the following will cause
1334: the scanner to enter the "SPECIAL" start condition whenever
1335: .B yylex()
1336: is called and the global variable
1337: .I enter_special
1338: is true:
1339: .nf
1340:
1341: int enter_special;
1342:
1343: %x SPECIAL
1344: %%
1345: if ( enter_special )
1346: BEGIN(SPECIAL);
1347:
1348: <SPECIAL>blahblahblah
1349: ...more rules follow...
1350:
1351: .fi
1352: .PP
1353: To illustrate the uses of start conditions,
1354: here is a scanner which provides two different interpretations
1355: of a string like "123.456". By default it will treat it as
1356: three tokens, the integer "123", a dot ('.'), and the integer "456".
1357: But if the string is preceded earlier in the line by the string
1358: "expect-floats"
1359: it will treat it as a single token, the floating-point number
1360: 123.456:
1361: .nf
1362:
1363: %{
1364: #include <math.h>
1365: %}
1366: %s expect
1367:
1368: %%
1369: expect-floats BEGIN(expect);
1370:
1371: <expect>[0-9]+"."[0-9]+ {
1372: printf( "found a float, = %f\\n",
1373: atof( yytext ) );
1374: }
1375: <expect>\\n {
1376: /* that's the end of the line, so
1377: * we need another "expect-number"
1378: * before we'll recognize any more
1379: * numbers
1380: */
1381: BEGIN(INITIAL);
1382: }
1383:
1384: [0-9]+ {
1385: printf( "found an integer, = %d\\n",
1386: atoi( yytext ) );
1387: }
1388:
1389: "." printf( "found a dot\\n" );
1390:
1391: .fi
1392: Here is a scanner which recognizes (and discards) C comments while
1393: maintaining a count of the current input line.
1394: .nf
1395:
1396: %x comment
1397: %%
1398: int line_num = 1;
1399:
1400: "/*" BEGIN(comment);
1401:
1402: <comment>[^*\\n]* /* eat anything that's not a '*' */
1403: <comment>"*"+[^*/\\n]* /* eat up '*'s not followed by '/'s */
1404: <comment>\\n ++line_num;
1405: <comment>"*"+"/" BEGIN(INITIAL);
1406:
1407: .fi
1408: This scanner goes to a bit of trouble to match as much
1409: text as possible with each rule. In general, when attempting to write
1.10 deraadt 1410: a high-speed scanner try to match as much as possible in each rule, as
1.1 deraadt 1411: it's a big win.
1412: .PP
1.10 deraadt 1413: Note that start-condition names are really integer values and
1.1 deraadt 1414: can be stored as such. Thus, the above could be extended in the
1415: following fashion:
1416: .nf
1417:
1418: %x comment foo
1419: %%
1420: int line_num = 1;
1421: int comment_caller;
1422:
1423: "/*" {
1424: comment_caller = INITIAL;
1425: BEGIN(comment);
1426: }
1427:
1428: ...
1429:
1430: <foo>"/*" {
1431: comment_caller = foo;
1432: BEGIN(comment);
1433: }
1434:
1435: <comment>[^*\\n]* /* eat anything that's not a '*' */
1436: <comment>"*"+[^*/\\n]* /* eat up '*'s not followed by '/'s */
1437: <comment>\\n ++line_num;
1438: <comment>"*"+"/" BEGIN(comment_caller);
1439:
1440: .fi
1441: Furthermore, you can access the current start condition using
1442: the integer-valued
1443: .B YY_START
1444: macro. For example, the above assignments to
1445: .I comment_caller
1446: could instead be written
1447: .nf
1448:
1449: comment_caller = YY_START;
1450:
1451: .fi
1452: Flex provides
1453: .B YYSTATE
1454: as an alias for
1455: .B YY_START
1456: (since that is what's used by AT&T
1457: .I lex).
1458: .PP
1459: Note that start conditions do not have their own name-space; %s's and %x's
1460: declare names in the same fashion as #define's.
1461: .PP
1462: Finally, here's an example of how to match C-style quoted strings using
1463: exclusive start conditions, including expanded escape sequences (but
1464: not including checking for a string that's too long):
1465: .nf
1466:
1467: %x str
1468:
1469: %%
1470: char string_buf[MAX_STR_CONST];
1471: char *string_buf_ptr;
1472:
1473:
1474: \\" string_buf_ptr = string_buf; BEGIN(str);
1475:
1476: <str>\\" { /* saw closing quote - all done */
1477: BEGIN(INITIAL);
1478: *string_buf_ptr = '\\0';
1479: /* return string constant token type and
1480: * value to parser
1481: */
1482: }
1483:
1484: <str>\\n {
1485: /* error - unterminated string constant */
1486: /* generate error message */
1487: }
1488:
1489: <str>\\\\[0-7]{1,3} {
1490: /* octal escape sequence */
1491: int result;
1492:
1493: (void) sscanf( yytext + 1, "%o", &result );
1494:
1495: if ( result > 0xff )
1496: /* error, constant is out-of-bounds */
1497:
1498: *string_buf_ptr++ = result;
1499: }
1500:
1501: <str>\\\\[0-9]+ {
1502: /* generate error - bad escape sequence; something
1503: * like '\\48' or '\\0777777'
1504: */
1505: }
1506:
1507: <str>\\\\n *string_buf_ptr++ = '\\n';
1508: <str>\\\\t *string_buf_ptr++ = '\\t';
1509: <str>\\\\r *string_buf_ptr++ = '\\r';
1510: <str>\\\\b *string_buf_ptr++ = '\\b';
1511: <str>\\\\f *string_buf_ptr++ = '\\f';
1512:
1513: <str>\\\\(.|\\n) *string_buf_ptr++ = yytext[1];
1514:
1515: <str>[^\\\\\\n\\"]+ {
1516: char *yptr = yytext;
1517:
1518: while ( *yptr )
1519: *string_buf_ptr++ = *yptr++;
1520: }
1521:
1522: .fi
1523: .PP
1524: Often, such as in some of the examples above, you wind up writing a
1525: whole bunch of rules all preceded by the same start condition(s). Flex
1526: makes this a little easier and cleaner by introducing a notion of
1527: start condition
1528: .I scope.
1529: A start condition scope is begun with:
1530: .nf
1531:
1532: <SCs>{
1533:
1534: .fi
1535: where
1536: .I SCs
1537: is a list of one or more start conditions. Inside the start condition
1538: scope, every rule automatically has the prefix
1539: .I <SCs>
1540: applied to it, until a
1541: .I '}'
1542: which matches the initial
1543: .I '{'.
1544: So, for example,
1545: .nf
1546:
1547: <ESC>{
1548: "\\\\n" return '\\n';
1549: "\\\\r" return '\\r';
1550: "\\\\f" return '\\f';
1551: "\\\\0" return '\\0';
1552: }
1553:
1554: .fi
1555: is equivalent to:
1556: .nf
1557:
1558: <ESC>"\\\\n" return '\\n';
1559: <ESC>"\\\\r" return '\\r';
1560: <ESC>"\\\\f" return '\\f';
1561: <ESC>"\\\\0" return '\\0';
1562:
1563: .fi
1564: Start condition scopes may be nested.
1565: .PP
1566: Three routines are available for manipulating stacks of start conditions:
1567: .TP
1568: .B void yy_push_state(int new_state)
1569: pushes the current start condition onto the top of the start condition
1570: stack and switches to
1571: .I new_state
1572: as though you had used
1573: .B BEGIN new_state
1574: (recall that start condition names are also integers).
1575: .TP
1576: .B void yy_pop_state()
1577: pops the top of the stack and switches to it via
1578: .B BEGIN.
1579: .TP
1580: .B int yy_top_state()
1581: returns the top of the stack without altering the stack's contents.
1582: .PP
1583: The start condition stack grows dynamically and so has no built-in
1584: size limitation. If memory is exhausted, program execution aborts.
1585: .PP
1586: To use start condition stacks, your scanner must include a
1587: .B %option stack
1588: directive (see Options below).
1589: .SH MULTIPLE INPUT BUFFERS
1590: Some scanners (such as those which support "include" files)
1591: require reading from several input streams. As
1592: .I flex
1593: scanners do a large amount of buffering, one cannot control
1594: where the next input will be read from by simply writing a
1595: .B YY_INPUT
1596: which is sensitive to the scanning context.
1597: .B YY_INPUT
1598: is only called when the scanner reaches the end of its buffer, which
1599: may be a long time after scanning a statement such as an "include"
1600: which requires switching the input source.
1601: .PP
1602: To negotiate these sorts of problems,
1603: .I flex
1604: provides a mechanism for creating and switching between multiple
1605: input buffers. An input buffer is created by using:
1606: .nf
1607:
1608: YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
1609:
1610: .fi
1611: which takes a
1612: .I FILE
1613: pointer and a size and creates a buffer associated with the given
1614: file and large enough to hold
1615: .I size
1616: characters (when in doubt, use
1617: .B YY_BUF_SIZE
1618: for the size). It returns a
1619: .B YY_BUFFER_STATE
1620: handle, which may then be passed to other routines (see below). The
1621: .B YY_BUFFER_STATE
1622: type is a pointer to an opaque
1623: .B struct yy_buffer_state
1624: structure, so you may safely initialize YY_BUFFER_STATE variables to
1625: .B ((YY_BUFFER_STATE) 0)
1626: if you wish, and also refer to the opaque structure in order to
1627: correctly declare input buffers in source files other than that
1628: of your scanner. Note that the
1629: .I FILE
1630: pointer in the call to
1631: .B yy_create_buffer
1632: is only used as the value of
1633: .I yyin
1634: seen by
1635: .B YY_INPUT;
1636: if you redefine
1637: .B YY_INPUT
1638: so it no longer uses
1639: .I yyin,
1640: then you can safely pass a nil
1641: .I FILE
1642: pointer to
1643: .B yy_create_buffer.
1644: You select a particular buffer to scan from using:
1645: .nf
1646:
1647: void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
1648:
1649: .fi
1650: switches the scanner's input buffer so subsequent tokens will
1651: come from
1652: .I new_buffer.
1653: Note that
1654: .B yy_switch_to_buffer()
1655: may be used by yywrap() to set things up for continued scanning, instead
1656: of opening a new file and pointing
1657: .I yyin
1658: at it. Note also that switching input sources via either
1659: .B yy_switch_to_buffer()
1660: or
1661: .B yywrap()
1662: does
1663: .I not
1664: change the start condition.
1665: .nf
1666:
1667: void yy_delete_buffer( YY_BUFFER_STATE buffer )
1668:
1669: .fi
1670: is used to reclaim the storage associated with a buffer. (
1671: .B buffer
1672: can be nil, in which case the routine does nothing.)
1673: You can also clear the current contents of a buffer using:
1674: .nf
1675:
1676: void yy_flush_buffer( YY_BUFFER_STATE buffer )
1677:
1678: .fi
1679: This function discards the buffer's contents,
1680: so the next time the scanner attempts to match a token from the
1681: buffer, it will first fill the buffer anew using
1682: .B YY_INPUT.
1683: .PP
1684: .B yy_new_buffer()
1685: is an alias for
1686: .B yy_create_buffer(),
1687: provided for compatibility with the C++ use of
1688: .I new
1689: and
1690: .I delete
1691: for creating and destroying dynamic objects.
1692: .PP
1693: Finally, the
1694: .B YY_CURRENT_BUFFER
1695: macro returns a
1696: .B YY_BUFFER_STATE
1697: handle to the current buffer.
1698: .PP
1699: Here is an example of using these features for writing a scanner
1700: which expands include files (the
1701: .B <<EOF>>
1702: feature is discussed below):
1703: .nf
1704:
1705: /* the "incl" state is used for picking up the name
1706: * of an include file
1707: */
1708: %x incl
1709:
1710: %{
1711: #define MAX_INCLUDE_DEPTH 10
1712: YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
1713: int include_stack_ptr = 0;
1714: %}
1715:
1716: %%
1717: include BEGIN(incl);
1718:
1719: [a-z]+ ECHO;
1720: [^a-z\\n]*\\n? ECHO;
1721:
1722: <incl>[ \\t]* /* eat the whitespace */
1723: <incl>[^ \\t\\n]+ { /* got the include file name */
1724: if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
1725: {
1726: fprintf( stderr, "Includes nested too deeply" );
1727: exit( 1 );
1728: }
1729:
1730: include_stack[include_stack_ptr++] =
1731: YY_CURRENT_BUFFER;
1732:
1733: yyin = fopen( yytext, "r" );
1734:
1735: if ( ! yyin )
1736: error( ... );
1737:
1738: yy_switch_to_buffer(
1739: yy_create_buffer( yyin, YY_BUF_SIZE ) );
1740:
1741: BEGIN(INITIAL);
1742: }
1743:
1744: <<EOF>> {
1745: if ( --include_stack_ptr < 0 )
1746: {
1747: yyterminate();
1748: }
1749:
1750: else
1751: {
1752: yy_delete_buffer( YY_CURRENT_BUFFER );
1753: yy_switch_to_buffer(
1754: include_stack[include_stack_ptr] );
1755: }
1756: }
1757:
1758: .fi
1759: Three routines are available for setting up input buffers for
1760: scanning in-memory strings instead of files. All of them create
1761: a new input buffer for scanning the string, and return a corresponding
1762: .B YY_BUFFER_STATE
1763: handle (which you should delete with
1764: .B yy_delete_buffer()
1765: when done with it). They also switch to the new buffer using
1766: .B yy_switch_to_buffer(),
1767: so the next call to
1768: .B yylex()
1769: will start scanning the string.
1770: .TP
1771: .B yy_scan_string(const char *str)
1772: scans a NUL-terminated string.
1773: .TP
1774: .B yy_scan_bytes(const char *bytes, int len)
1775: scans
1776: .I len
1777: bytes (including possibly NUL's)
1778: starting at location
1779: .I bytes.
1780: .PP
1781: Note that both of these functions create and scan a
1782: .I copy
1783: of the string or bytes. (This may be desirable, since
1784: .B yylex()
1785: modifies the contents of the buffer it is scanning.) You can avoid the
1786: copy by using:
1787: .TP
1788: .B yy_scan_buffer(char *base, yy_size_t size)
1789: which scans in place the buffer starting at
1790: .I base,
1791: consisting of
1792: .I size
1793: bytes, the last two bytes of which
1794: .I must
1795: be
1796: .B YY_END_OF_BUFFER_CHAR
1797: (ASCII NUL).
1798: These last two bytes are not scanned; thus, scanning
1799: consists of
1800: .B base[0]
1801: through
1802: .B base[size-2],
1803: inclusive.
1804: .IP
1805: If you fail to set up
1806: .I base
1807: in this manner (i.e., forget the final two
1808: .B YY_END_OF_BUFFER_CHAR
1809: bytes), then
1810: .B yy_scan_buffer()
1811: returns a nil pointer instead of creating a new input buffer.
1812: .IP
1813: The type
1814: .B yy_size_t
1815: is an integral type to which you can cast an integer expression
1816: reflecting the size of the buffer.
1817: .SH END-OF-FILE RULES
1818: The special rule "<<EOF>>" indicates
1819: actions which are to be taken when an end-of-file is
1820: encountered and yywrap() returns non-zero (i.e., indicates
1821: no further files to process). The action must finish
1822: by doing one of four things:
1823: .IP -
1824: assigning
1825: .I yyin
1826: to a new input file (in previous versions of flex, after doing the
1827: assignment you had to call the special action
1828: .B YY_NEW_FILE;
1829: this is no longer necessary);
1830: .IP -
1831: executing a
1832: .I return
1833: statement;
1834: .IP -
1835: executing the special
1836: .B yyterminate()
1837: action;
1838: .IP -
1839: or, switching to a new buffer using
1840: .B yy_switch_to_buffer()
1841: as shown in the example above.
1842: .PP
1843: <<EOF>> rules may not be used with other
1844: patterns; they may only be qualified with a list of start
1845: conditions. If an unqualified <<EOF>> rule is given, it
1846: applies to
1847: .I all
1848: start conditions which do not already have <<EOF>> actions. To
1849: specify an <<EOF>> rule for only the initial start condition, use
1850: .nf
1851:
1852: <INITIAL><<EOF>>
1853:
1854: .fi
1855: .PP
1856: These rules are useful for catching things like unclosed comments.
1857: An example:
1858: .nf
1859:
1860: %x quote
1861: %%
1862:
1863: ...other rules for dealing with quotes...
1864:
1865: <quote><<EOF>> {
1866: error( "unterminated quote" );
1867: yyterminate();
1868: }
1869: <<EOF>> {
1870: if ( *++filelist )
1871: yyin = fopen( *filelist, "r" );
1872: else
1873: yyterminate();
1874: }
1875:
1876: .fi
1877: .SH MISCELLANEOUS MACROS
1878: The macro
1879: .B YY_USER_ACTION
1880: can be defined to provide an action
1881: which is always executed prior to the matched rule's action. For example,
1882: it could be #define'd to call a routine to convert yytext to lower-case.
1883: When
1884: .B YY_USER_ACTION
1885: is invoked, the variable
1886: .I yy_act
1887: gives the number of the matched rule (rules are numbered starting with 1).
1888: Suppose you want to profile how often each of your rules is matched. The
1889: following would do the trick:
1890: .nf
1891:
1892: #define YY_USER_ACTION ++ctr[yy_act]
1893:
1894: .fi
1895: where
1896: .I ctr
1897: is an array to hold the counts for the different rules. Note that
1898: the macro
1899: .B YY_NUM_RULES
1900: gives the total number of rules (including the default rule, even if
1901: you use
1902: .B \-s),
1903: so a correct declaration for
1904: .I ctr
1905: is:
1906: .nf
1907:
1908: int ctr[YY_NUM_RULES];
1909:
1910: .fi
1911: .PP
1912: The macro
1913: .B YY_USER_INIT
1914: may be defined to provide an action which is always executed before
1915: the first scan (and before the scanner's internal initializations are done).
1916: For example, it could be used to call a routine to read
1917: in a data table or open a logging file.
1918: .PP
1919: The macro
1920: .B yy_set_interactive(is_interactive)
1921: can be used to control whether the current buffer is considered
1922: .I interactive.
1923: An interactive buffer is processed more slowly,
1924: but must be used when the scanner's input source is indeed
1925: interactive to avoid problems due to waiting to fill buffers
1926: (see the discussion of the
1927: .B \-I
1928: flag below). A non-zero value
1.7 aaron 1929: in the macro invocation marks the buffer as interactive, a zero
1.1 deraadt 1930: value as non-interactive. Note that use of this macro overrides
1931: .B %option always-interactive
1932: or
1933: .B %option never-interactive
1934: (see Options below).
1935: .B yy_set_interactive()
1936: must be invoked prior to beginning to scan the buffer that is
1937: (or is not) to be considered interactive.
1938: .PP
1939: The macro
1940: .B yy_set_bol(at_bol)
1941: can be used to control whether the current buffer's scanning
1942: context for the next token match is done as though at the
1943: beginning of a line. A non-zero macro argument makes rules anchored with
1.10 deraadt 1944: \'^' active, while a zero argument makes '^' rules inactive.
1.1 deraadt 1945: .PP
1946: The macro
1947: .B YY_AT_BOL()
1948: returns true if the next token scanned from the current buffer
1949: will have '^' rules active, false otherwise.
1950: .PP
1951: In the generated scanner, the actions are all gathered in one large
1952: switch statement and separated using
1953: .B YY_BREAK,
1954: which may be redefined. By default, it is simply a "break", to separate
1.10 deraadt 1955: each rule's action from the following rules.
1.1 deraadt 1956: Redefining
1957: .B YY_BREAK
1958: allows, for example, C++ users to
1959: #define YY_BREAK to do nothing (while being very careful that every
1960: rule ends with a "break" or a "return"!) to avoid suffering from
1961: unreachable statement warnings where because a rule's action ends with
1962: "return", the
1963: .B YY_BREAK
1964: is inaccessible.
1965: .SH VALUES AVAILABLE TO THE USER
1966: This section summarizes the various values available to the user
1967: in the rule actions.
1968: .IP -
1969: .B char *yytext
1970: holds the text of the current token. It may be modified but not lengthened
1971: (you cannot append characters to the end).
1972: .IP
1973: If the special directive
1974: .B %array
1975: appears in the first section of the scanner description, then
1976: .B yytext
1977: is instead declared
1978: .B char yytext[YYLMAX],
1979: where
1980: .B YYLMAX
1981: is a macro definition that you can redefine in the first section
1982: if you don't like the default value (generally 8KB). Using
1983: .B %array
1984: results in somewhat slower scanners, but the value of
1985: .B yytext
1986: becomes immune to calls to
1987: .I input()
1988: and
1989: .I unput(),
1990: which potentially destroy its value when
1991: .B yytext
1992: is a character pointer. The opposite of
1993: .B %array
1994: is
1995: .B %pointer,
1996: which is the default.
1997: .IP
1998: You cannot use
1999: .B %array
2000: when generating C++ scanner classes
2001: (the
2002: .B \-+
2003: flag).
2004: .IP -
2005: .B int yyleng
2006: holds the length of the current token.
2007: .IP -
2008: .B FILE *yyin
2009: is the file which by default
2010: .I flex
2011: reads from. It may be redefined but doing so only makes sense before
2012: scanning begins or after an EOF has been encountered. Changing it in
2013: the midst of scanning will have unexpected results since
2014: .I flex
2015: buffers its input; use
2016: .B yyrestart()
2017: instead.
2018: Once scanning terminates because an end-of-file
2019: has been seen, you can assign
2020: .I yyin
2021: at the new input file and then call the scanner again to continue scanning.
2022: .IP -
2023: .B void yyrestart( FILE *new_file )
2024: may be called to point
2025: .I yyin
2026: at the new input file. The switch-over to the new file is immediate
2027: (any previously buffered-up input is lost). Note that calling
2028: .B yyrestart()
2029: with
2030: .I yyin
2031: as an argument thus throws away the current input buffer and continues
2032: scanning the same input file.
2033: .IP -
2034: .B FILE *yyout
2035: is the file to which
2036: .B ECHO
2037: actions are done. It can be reassigned by the user.
2038: .IP -
2039: .B YY_CURRENT_BUFFER
2040: returns a
2041: .B YY_BUFFER_STATE
2042: handle to the current buffer.
2043: .IP -
2044: .B YY_START
2045: returns an integer value corresponding to the current start
2046: condition. You can subsequently use this value with
2047: .B BEGIN
2048: to return to that start condition.
2049: .SH INTERFACING WITH YACC
2050: One of the main uses of
2051: .I flex
2052: is as a companion to the
2053: .I yacc
2054: parser-generator.
2055: .I yacc
2056: parsers expect to call a routine named
2057: .B yylex()
2058: to find the next input token. The routine is supposed to
2059: return the type of the next token as well as putting any associated
2060: value in the global
2061: .B yylval.
2062: To use
2063: .I flex
2064: with
2065: .I yacc,
2066: one specifies the
2067: .B \-d
2068: option to
2069: .I yacc
2070: to instruct it to generate the file
2071: .B y.tab.h
2072: containing definitions of all the
2073: .B %tokens
2074: appearing in the
2075: .I yacc
2076: input. This file is then included in the
2077: .I flex
2078: scanner. For example, if one of the tokens is "TOK_NUMBER",
2079: part of the scanner might look like:
2080: .nf
2081:
2082: %{
2083: #include "y.tab.h"
2084: %}
2085:
2086: %%
2087:
2088: [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;
2089:
2090: .fi
2091: .SH OPTIONS
2092: .I flex
2093: has the following options:
2094: .TP
2095: .B \-b
2096: Generate backing-up information to
2097: .I lex.backup.
2098: This is a list of scanner states which require backing up
2099: and the input characters on which they do so. By adding rules one
2100: can remove backing-up states. If
2101: .I all
2102: backing-up states are eliminated and
2103: .B \-Cf
2104: or
2105: .B \-CF
2106: is used, the generated scanner will run faster (see the
2107: .B \-p
2108: flag). Only users who wish to squeeze every last cycle out of their
2109: scanners need worry about this option. (See the section on Performance
2110: Considerations below.)
2111: .TP
2112: .B \-c
2113: is a do-nothing, deprecated option included for POSIX compliance.
2114: .TP
2115: .B \-d
2116: makes the generated scanner run in
2117: .I debug
2118: mode. Whenever a pattern is recognized and the global
2119: .B yy_flex_debug
2120: is non-zero (which is the default),
2121: the scanner will write to
2122: .I stderr
2123: a line of the form:
2124: .nf
2125:
2126: --accepting rule at line 53 ("the matched text")
2127:
2128: .fi
2129: The line number refers to the location of the rule in the file
2130: defining the scanner (i.e., the file that was fed to flex). Messages
2131: are also generated when the scanner backs up, accepts the
2132: default rule, reaches the end of its input buffer (or encounters
2133: a NUL; at this point, the two look the same as far as the scanner's concerned),
2134: or reaches an end-of-file.
2135: .TP
2136: .B \-f
2137: specifies
2138: .I fast scanner.
2139: No table compression is done and stdio is bypassed.
2140: The result is large but fast. This option is equivalent to
2141: .B \-Cfr
2142: (see below).
2143: .TP
2144: .B \-h
2145: generates a "help" summary of
2146: .I flex's
2147: options to
1.7 aaron 2148: .I stdout
1.1 deraadt 2149: and then exits.
2150: .B \-?
2151: and
2152: .B \-\-help
2153: are synonyms for
2154: .B \-h.
2155: .TP
2156: .B \-i
2157: instructs
2158: .I flex
2159: to generate a
2160: .I case-insensitive
2161: scanner. The case of letters given in the
2162: .I flex
2163: input patterns will
2164: be ignored, and tokens in the input will be matched regardless of case. The
2165: matched text given in
2166: .I yytext
2167: will have the preserved case (i.e., it will not be folded).
2168: .TP
2169: .B \-l
2170: turns on maximum compatibility with the original AT&T
2171: .I lex
2172: implementation. Note that this does not mean
2173: .I full
2174: compatibility. Use of this option costs a considerable amount of
2175: performance, and it cannot be used with the
2176: .B \-+, -f, -F, -Cf,
2177: or
2178: .B -CF
2179: options. For details on the compatibilities it provides, see the section
2180: "Incompatibilities With Lex And POSIX" below. This option also results
2181: in the name
2182: .B YY_FLEX_LEX_COMPAT
2183: being #define'd in the generated scanner.
2184: .TP
2185: .B \-n
2186: is another do-nothing, deprecated option included only for
2187: POSIX compliance.
2188: .TP
2189: .B \-p
2190: generates a performance report to stderr. The report
2191: consists of comments regarding features of the
2192: .I flex
2193: input file which will cause a serious loss of performance in the resulting
2194: scanner. If you give the flag twice, you will also get comments regarding
2195: features that lead to minor performance losses.
2196: .IP
2197: Note that the use of
2198: .B REJECT,
2199: .B %option yylineno,
2200: and variable trailing context (see the Deficiencies / Bugs section below)
2201: entails a substantial performance penalty; use of
2202: .I yymore(),
2203: the
2204: .B ^
2205: operator,
2206: and the
2207: .B \-I
2208: flag entail minor performance penalties.
2209: .TP
2210: .B \-s
2211: causes the
2212: .I default rule
2213: (that unmatched scanner input is echoed to
2214: .I stdout)
2215: to be suppressed. If the scanner encounters input that does not
2216: match any of its rules, it aborts with an error. This option is
2217: useful for finding holes in a scanner's rule set.
2218: .TP
2219: .B \-t
2220: instructs
2221: .I flex
2222: to write the scanner it generates to standard output instead
2223: of
2224: .B lex.yy.c.
2225: .TP
2226: .B \-v
2227: specifies that
2228: .I flex
2229: should write to
2230: .I stderr
2231: a summary of statistics regarding the scanner it generates.
2232: Most of the statistics are meaningless to the casual
2233: .I flex
2234: user, but the first line identifies the version of
2235: .I flex
2236: (same as reported by
2237: .B \-V),
2238: and the next line the flags used when generating the scanner, including
2239: those that are on by default.
2240: .TP
2241: .B \-w
2242: suppresses warning messages.
2243: .TP
2244: .B \-B
2245: instructs
2246: .I flex
2247: to generate a
2248: .I batch
2249: scanner, the opposite of
2250: .I interactive
2251: scanners generated by
2252: .B \-I
2253: (see below). In general, you use
2254: .B \-B
2255: when you are
2256: .I certain
2257: that your scanner will never be used interactively, and you want to
2258: squeeze a
2259: .I little
2260: more performance out of it. If your goal is instead to squeeze out a
2261: .I lot
2262: more performance, you should be using the
2263: .B \-Cf
2264: or
2265: .B \-CF
2266: options (discussed below), which turn on
2267: .B \-B
2268: automatically anyway.
2269: .TP
2270: .B \-F
2271: specifies that the
2272: .ul
2273: fast
2274: scanner table representation should be used (and stdio
2275: bypassed). This representation is
2276: about as fast as the full table representation
2277: .B (-f),
2278: and for some sets of patterns will be considerably smaller (and for
2279: others, larger). In general, if the pattern set contains both "keywords"
2280: and a catch-all, "identifier" rule, such as in the set:
2281: .nf
2282:
2283: "case" return TOK_CASE;
2284: "switch" return TOK_SWITCH;
2285: ...
2286: "default" return TOK_DEFAULT;
2287: [a-z]+ return TOK_ID;
2288:
2289: .fi
2290: then you're better off using the full table representation. If only
2291: the "identifier" rule is present and you then use a hash table or some such
2292: to detect the keywords, you're better off using
2293: .B -F.
2294: .IP
2295: This option is equivalent to
2296: .B \-CFr
2297: (see below). It cannot be used with
2298: .B \-+.
2299: .TP
2300: .B \-I
2301: instructs
2302: .I flex
2303: to generate an
2304: .I interactive
2305: scanner. An interactive scanner is one that only looks ahead to decide
2306: what token has been matched if it absolutely must. It turns out that
2307: always looking one extra character ahead, even if the scanner has already
2308: seen enough text to disambiguate the current token, is a bit faster than
2309: only looking ahead when necessary. But scanners that always look ahead
2310: give dreadful interactive performance; for example, when a user types
2311: a newline, it is not recognized as a newline token until they enter
2312: .I another
2313: token, which often means typing in another whole line.
2314: .IP
2315: .I Flex
2316: scanners default to
2317: .I interactive
2318: unless you use the
2319: .B \-Cf
2320: or
2321: .B \-CF
2322: table-compression options (see below). That's because if you're looking
2323: for high-performance you should be using one of these options, so if you
2324: didn't,
2325: .I flex
2326: assumes you'd rather trade off a bit of run-time performance for intuitive
2327: interactive behavior. Note also that you
2328: .I cannot
2329: use
2330: .B \-I
2331: in conjunction with
2332: .B \-Cf
2333: or
2334: .B \-CF.
2335: Thus, this option is not really needed; it is on by default for all those
2336: cases in which it is allowed.
2337: .IP
2338: You can force a scanner to
2339: .I not
2340: be interactive by using
2341: .B \-B
2342: (see above).
2343: .TP
2344: .B \-L
2345: instructs
2346: .I flex
2347: not to generate
2348: .B #line
2349: directives. Without this option,
2350: .I flex
2351: peppers the generated scanner
2352: with #line directives so error messages in the actions will be correctly
2353: located with respect to either the original
2354: .I flex
2355: input file (if the errors are due to code in the input file), or
2356: .B lex.yy.c
2357: (if the errors are
2358: .I flex's
2359: fault -- you should report these sorts of errors to the email address
2360: given below).
2361: .TP
2362: .B \-T
2363: makes
2364: .I flex
2365: run in
2366: .I trace
2367: mode. It will generate a lot of messages to
2368: .I stderr
2369: concerning
2370: the form of the input and the resultant non-deterministic and deterministic
2371: finite automata. This option is mostly for use in maintaining
2372: .I flex.
2373: .TP
2374: .B \-V
2375: prints the version number to
2376: .I stdout
2377: and exits.
2378: .B \-\-version
2379: is a synonym for
2380: .B \-V.
2381: .TP
2382: .B \-7
2383: instructs
2384: .I flex
2385: to generate a 7-bit scanner, i.e., one which can only recognized 7-bit
2386: characters in its input. The advantage of using
2387: .B \-7
2388: is that the scanner's tables can be up to half the size of those generated
2389: using the
2390: .B \-8
2391: option (see below). The disadvantage is that such scanners often hang
2392: or crash if their input contains an 8-bit character.
2393: .IP
2394: Note, however, that unless you generate your scanner using the
2395: .B \-Cf
2396: or
2397: .B \-CF
2398: table compression options, use of
2399: .B \-7
2400: will save only a small amount of table space, and make your scanner
2401: considerably less portable.
2402: .I Flex's
2403: default behavior is to generate an 8-bit scanner unless you use the
2404: .B \-Cf
2405: or
2406: .B \-CF,
2407: in which case
2408: .I flex
2409: defaults to generating 7-bit scanners unless your site was always
2410: configured to generate 8-bit scanners (as will often be the case
2411: with non-USA sites). You can tell whether flex generated a 7-bit
2412: or an 8-bit scanner by inspecting the flag summary in the
2413: .B \-v
2414: output as described above.
2415: .IP
2416: Note that if you use
2417: .B \-Cfe
2418: or
2419: .B \-CFe
2420: (those table compression options, but also using equivalence classes as
2421: discussed see below), flex still defaults to generating an 8-bit
2422: scanner, since usually with these compression options full 8-bit tables
2423: are not much more expensive than 7-bit tables.
2424: .TP
2425: .B \-8
2426: instructs
2427: .I flex
2428: to generate an 8-bit scanner, i.e., one which can recognize 8-bit
2429: characters. This flag is only needed for scanners generated using
2430: .B \-Cf
2431: or
2432: .B \-CF,
2433: as otherwise flex defaults to generating an 8-bit scanner anyway.
2434: .IP
2435: See the discussion of
2436: .B \-7
2437: above for flex's default behavior and the tradeoffs between 7-bit
2438: and 8-bit scanners.
2439: .TP
2440: .B \-+
2441: specifies that you want flex to generate a C++
2442: scanner class. See the section on Generating C++ Scanners below for
2443: details.
1.7 aaron 2444: .TP
1.1 deraadt 2445: .B \-C[aefFmr]
2446: controls the degree of table compression and, more generally, trade-offs
2447: between small scanners and fast scanners.
2448: .IP
2449: .B \-Ca
2450: ("align") instructs flex to trade off larger tables in the
2451: generated scanner for faster performance because the elements of
2452: the tables are better aligned for memory access and computation. On some
2453: RISC architectures, fetching and manipulating longwords is more efficient
2454: than with smaller-sized units such as shortwords. This option can
2455: double the size of the tables used by your scanner.
2456: .IP
2457: .B \-Ce
2458: directs
2459: .I flex
2460: to construct
2461: .I equivalence classes,
2462: i.e., sets of characters
2463: which have identical lexical properties (for example, if the only
2464: appearance of digits in the
2465: .I flex
2466: input is in the character class
2467: "[0-9]" then the digits '0', '1', ..., '9' will all be put
2468: in the same equivalence class). Equivalence classes usually give
2469: dramatic reductions in the final table/object file sizes (typically
2470: a factor of 2-5) and are pretty cheap performance-wise (one array
2471: look-up per character scanned).
2472: .IP
2473: .B \-Cf
2474: specifies that the
2475: .I full
2476: scanner tables should be generated -
2477: .I flex
2478: should not compress the
1.10 deraadt 2479: tables by taking advantage of similar transition functions for
1.1 deraadt 2480: different states.
2481: .IP
2482: .B \-CF
2483: specifies that the alternate fast scanner representation (described
2484: above under the
2485: .B \-F
2486: flag)
2487: should be used. This option cannot be used with
2488: .B \-+.
2489: .IP
2490: .B \-Cm
2491: directs
2492: .I flex
2493: to construct
2494: .I meta-equivalence classes,
2495: which are sets of equivalence classes (or characters, if equivalence
2496: classes are not being used) that are commonly used together. Meta-equivalence
2497: classes are often a big win when using compressed tables, but they
2498: have a moderate performance impact (one or two "if" tests and one
2499: array look-up per character scanned).
2500: .IP
2501: .B \-Cr
2502: causes the generated scanner to
2503: .I bypass
2504: use of the standard I/O library (stdio) for input. Instead of calling
2505: .B fread()
2506: or
2507: .B getc(),
2508: the scanner will use the
2509: .B read()
2510: system call, resulting in a performance gain which varies from system
2511: to system, but in general is probably negligible unless you are also using
2512: .B \-Cf
2513: or
2514: .B \-CF.
2515: Using
2516: .B \-Cr
2517: can cause strange behavior if, for example, you read from
2518: .I yyin
2519: using stdio prior to calling the scanner (because the scanner will miss
2520: whatever text your previous reads left in the stdio input buffer).
2521: .IP
2522: .B \-Cr
2523: has no effect if you define
2524: .B YY_INPUT
2525: (see The Generated Scanner above).
2526: .IP
2527: A lone
2528: .B \-C
2529: specifies that the scanner tables should be compressed but neither
2530: equivalence classes nor meta-equivalence classes should be used.
2531: .IP
2532: The options
2533: .B \-Cf
2534: or
2535: .B \-CF
2536: and
2537: .B \-Cm
2538: do not make sense together - there is no opportunity for meta-equivalence
2539: classes if the table is not being compressed. Otherwise the options
2540: may be freely mixed, and are cumulative.
2541: .IP
2542: The default setting is
2543: .B \-Cem,
2544: which specifies that
2545: .I flex
2546: should generate equivalence classes
2547: and meta-equivalence classes. This setting provides the highest
2548: degree of table compression. You can trade off
2549: faster-executing scanners at the cost of larger tables with
2550: the following generally being true:
2551: .nf
2552:
2553: slowest & smallest
2554: -Cem
2555: -Cm
2556: -Ce
2557: -C
2558: -C{f,F}e
2559: -C{f,F}
2560: -C{f,F}a
2561: fastest & largest
2562:
2563: .fi
2564: Note that scanners with the smallest tables are usually generated and
2565: compiled the quickest, so
2566: during development you will usually want to use the default, maximal
2567: compression.
2568: .IP
2569: .B \-Cfe
2570: is often a good compromise between speed and size for production
2571: scanners.
2572: .TP
2573: .B \-ooutput
2574: directs flex to write the scanner to the file
2575: .B output
2576: instead of
2577: .B lex.yy.c.
2578: If you combine
2579: .B \-o
2580: with the
2581: .B \-t
2582: option, then the scanner is written to
2583: .I stdout
2584: but its
2585: .B #line
2586: directives (see the
2587: .B \\-L
2588: option above) refer to the file
2589: .B output.
2590: .TP
2591: .B \-Pprefix
2592: changes the default
2593: .I "yy"
2594: prefix used by
2595: .I flex
1.6 aaron 2596: for all globally visible variable and function names to instead be
1.1 deraadt 2597: .I prefix.
2598: For example,
2599: .B \-Pfoo
2600: changes the name of
2601: .B yytext
2602: to
2603: .B footext.
2604: It also changes the name of the default output file from
2605: .B lex.yy.c
2606: to
2607: .B lex.foo.c.
2608: Here are all of the names affected:
2609: .nf
2610:
2611: yy_create_buffer
2612: yy_delete_buffer
2613: yy_flex_debug
2614: yy_init_buffer
2615: yy_flush_buffer
2616: yy_load_buffer_state
2617: yy_switch_to_buffer
2618: yyin
2619: yyleng
2620: yylex
2621: yylineno
2622: yyout
2623: yyrestart
2624: yytext
2625: yywrap
2626:
2627: .fi
2628: (If you are using a C++ scanner, then only
2629: .B yywrap
2630: and
2631: .B yyFlexLexer
2632: are affected.)
2633: Within your scanner itself, you can still refer to the global variables
2634: and functions using either version of their name; but externally, they
2635: have the modified name.
2636: .IP
2637: This option lets you easily link together multiple
2638: .I flex
2639: programs into the same executable. Note, though, that using this
2640: option also renames
2641: .B yywrap(),
2642: so you now
2643: .I must
2644: either
1.6 aaron 2645: provide your own (appropriately named) version of the routine for your
1.1 deraadt 2646: scanner, or use
2647: .B %option noyywrap,
2648: as linking with
2649: .B \-lfl
2650: no longer provides one for you by default.
2651: .TP
2652: .B \-Sskeleton_file
2653: overrides the default skeleton file from which
2654: .I flex
2655: constructs its scanners. You'll never need this option unless you are doing
2656: .I flex
2657: maintenance or development.
2658: .PP
2659: .I flex
2660: also provides a mechanism for controlling options within the
2661: scanner specification itself, rather than from the flex command-line.
2662: This is done by including
2663: .B %option
2664: directives in the first section of the scanner specification.
2665: You can specify multiple options with a single
2666: .B %option
2667: directive, and multiple directives in the first section of your flex input
2668: file.
2669: .PP
2670: Most options are given simply as names, optionally preceded by the
2671: word "no" (with no intervening whitespace) to negate their meaning.
2672: A number are equivalent to flex flags or their negation:
2673: .nf
2674:
2675: 7bit -7 option
2676: 8bit -8 option
2677: align -Ca option
2678: backup -b option
2679: batch -B option
2680: c++ -+ option
2681:
2682: caseful or
2683: case-sensitive opposite of -i (default)
2684:
2685: case-insensitive or
2686: caseless -i option
2687:
2688: debug -d option
2689: default opposite of -s option
2690: ecs -Ce option
2691: fast -F option
2692: full -f option
2693: interactive -I option
2694: lex-compat -l option
2695: meta-ecs -Cm option
2696: perf-report -p option
2697: read -Cr option
2698: stdout -t option
2699: verbose -v option
2700: warn opposite of -w option
2701: (use "%option nowarn" for -w)
2702:
2703: array equivalent to "%array"
2704: pointer equivalent to "%pointer" (default)
2705:
2706: .fi
2707: Some
2708: .B %option's
2709: provide features otherwise not available:
2710: .TP
2711: .B always-interactive
2712: instructs flex to generate a scanner which always considers its input
2713: "interactive". Normally, on each new input file the scanner calls
2714: .B isatty()
2715: in an attempt to determine whether
2716: the scanner's input source is interactive and thus should be read a
2717: character at a time. When this option is used, however, then no
2718: such call is made.
2719: .TP
2720: .B main
2721: directs flex to provide a default
2722: .B main()
2723: program for the scanner, which simply calls
2724: .B yylex().
2725: This option implies
2726: .B noyywrap
2727: (see below).
2728: .TP
2729: .B never-interactive
2730: instructs flex to generate a scanner which never considers its input
2731: "interactive" (again, no call made to
2732: .B isatty()).
2733: This is the opposite of
2734: .B always-interactive.
2735: .TP
2736: .B stack
2737: enables the use of start condition stacks (see Start Conditions above).
2738: .TP
2739: .B stdinit
2740: if set (i.e.,
2741: .B %option stdinit)
2742: initializes
2743: .I yyin
2744: and
2745: .I yyout
2746: to
2747: .I stdin
2748: and
2749: .I stdout,
2750: instead of the default of
2751: .I nil.
2752: Some existing
2753: .I lex
2754: programs depend on this behavior, even though it is not compliant with
2755: ANSI C, which does not require
2756: .I stdin
2757: and
2758: .I stdout
2759: to be compile-time constant.
2760: .TP
2761: .B yylineno
2762: directs
2763: .I flex
2764: to generate a scanner that maintains the number of the current line
2765: read from its input in the global variable
2766: .B yylineno.
2767: This option is implied by
2768: .B %option lex-compat.
2769: .TP
2770: .B yywrap
2771: if unset (i.e.,
2772: .B %option noyywrap),
2773: makes the scanner not call
2774: .B yywrap()
2775: upon an end-of-file, but simply assume that there are no more
2776: files to scan (until the user points
2777: .I yyin
2778: at a new file and calls
2779: .B yylex()
2780: again).
2781: .PP
2782: .I flex
2783: scans your rule actions to determine whether you use the
2784: .B REJECT
2785: or
2786: .B yymore()
2787: features. The
2788: .B reject
2789: and
2790: .B yymore
2791: options are available to override its decision as to whether you use the
2792: options, either by setting them (e.g.,
2793: .B %option reject)
2794: to indicate the feature is indeed used, or
2795: unsetting them to indicate it actually is not used
2796: (e.g.,
2797: .B %option noyymore).
2798: .PP
2799: Three options take string-delimited values, offset with '=':
2800: .nf
2801:
2802: %option outfile="ABC"
2803:
2804: .fi
2805: is equivalent to
2806: .B -oABC,
2807: and
2808: .nf
2809:
2810: %option prefix="XYZ"
2811:
2812: .fi
2813: is equivalent to
2814: .B -PXYZ.
2815: Finally,
2816: .nf
2817:
2818: %option yyclass="foo"
2819:
2820: .fi
2821: only applies when generating a C++ scanner (
2822: .B \-+
2823: option). It informs
2824: .I flex
2825: that you have derived
2826: .B foo
2827: as a subclass of
2828: .B yyFlexLexer,
2829: so
2830: .I flex
2831: will place your actions in the member function
2832: .B foo::yylex()
2833: instead of
2834: .B yyFlexLexer::yylex().
2835: It also generates a
2836: .B yyFlexLexer::yylex()
2837: member function that emits a run-time error (by invoking
2838: .B yyFlexLexer::LexerError())
2839: if called.
2840: See Generating C++ Scanners, below, for additional information.
2841: .PP
2842: A number of options are available for lint purists who want to suppress
2843: the appearance of unneeded routines in the generated scanner. Each of the
2844: following, if unset
2845: (e.g.,
2846: .B %option nounput
2847: ), results in the corresponding routine not appearing in
2848: the generated scanner:
2849: .nf
2850:
2851: input, unput
2852: yy_push_state, yy_pop_state, yy_top_state
2853: yy_scan_buffer, yy_scan_bytes, yy_scan_string
2854:
2855: .fi
2856: (though
2857: .B yy_push_state()
2858: and friends won't appear anyway unless you use
2859: .B %option stack).
2860: .SH PERFORMANCE CONSIDERATIONS
2861: The main design goal of
2862: .I flex
2863: is that it generate high-performance scanners. It has been optimized
2864: for dealing well with large sets of rules. Aside from the effects on
2865: scanner speed of the table compression
2866: .B \-C
2867: options outlined above,
2868: there are a number of options/actions which degrade performance. These
2869: are, from most expensive to least:
2870: .nf
2871:
2872: REJECT
2873: %option yylineno
2874: arbitrary trailing context
2875:
2876: pattern sets that require backing up
2877: %array
2878: %option interactive
2879: %option always-interactive
2880:
2881: '^' beginning-of-line operator
2882: yymore()
2883:
2884: .fi
2885: with the first three all being quite expensive and the last two
2886: being quite cheap. Note also that
2887: .B unput()
2888: is implemented as a routine call that potentially does quite a bit of
2889: work, while
2890: .B yyless()
2891: is a quite-cheap macro; so if just putting back some excess text you
2892: scanned, use
2893: .B yyless().
2894: .PP
2895: .B REJECT
2896: should be avoided at all costs when performance is important.
2897: It is a particularly expensive option.
2898: .PP
2899: Getting rid of backing up is messy and often may be an enormous
2900: amount of work for a complicated scanner. In principal, one begins
2901: by using the
1.7 aaron 2902: .B \-b
1.1 deraadt 2903: flag to generate a
2904: .I lex.backup
2905: file. For example, on the input
2906: .nf
2907:
2908: %%
2909: foo return TOK_KEYWORD;
2910: foobar return TOK_KEYWORD;
2911:
2912: .fi
2913: the file looks like:
2914: .nf
2915:
2916: State #6 is non-accepting -
2917: associated rule line numbers:
2918: 2 3
2919: out-transitions: [ o ]
2920: jam-transitions: EOF [ \\001-n p-\\177 ]
2921:
2922: State #8 is non-accepting -
2923: associated rule line numbers:
2924: 3
2925: out-transitions: [ a ]
2926: jam-transitions: EOF [ \\001-` b-\\177 ]
2927:
2928: State #9 is non-accepting -
2929: associated rule line numbers:
2930: 3
2931: out-transitions: [ r ]
2932: jam-transitions: EOF [ \\001-q s-\\177 ]
2933:
2934: Compressed tables always back up.
2935:
2936: .fi
2937: The first few lines tell us that there's a scanner state in
2938: which it can make a transition on an 'o' but not on any other
2939: character, and that in that state the currently scanned text does not match
2940: any rule. The state occurs when trying to match the rules found
2941: at lines 2 and 3 in the input file.
2942: If the scanner is in that state and then reads
2943: something other than an 'o', it will have to back up to find
2944: a rule which is matched. With
2945: a bit of headscratching one can see that this must be the
2946: state it's in when it has seen "fo". When this has happened,
2947: if anything other than another 'o' is seen, the scanner will
2948: have to back up to simply match the 'f' (by the default rule).
2949: .PP
2950: The comment regarding State #8 indicates there's a problem
2951: when "foob" has been scanned. Indeed, on any character other
2952: than an 'a', the scanner will have to back up to accept "foo".
2953: Similarly, the comment for State #9 concerns when "fooba" has
2954: been scanned and an 'r' does not follow.
2955: .PP
2956: The final comment reminds us that there's no point going to
2957: all the trouble of removing backing up from the rules unless
2958: we're using
2959: .B \-Cf
2960: or
2961: .B \-CF,
2962: since there's no performance gain doing so with compressed scanners.
2963: .PP
2964: The way to remove the backing up is to add "error" rules:
2965: .nf
2966:
2967: %%
2968: foo return TOK_KEYWORD;
2969: foobar return TOK_KEYWORD;
2970:
2971: fooba |
2972: foob |
2973: fo {
2974: /* false alarm, not really a keyword */
2975: return TOK_ID;
2976: }
2977:
2978: .fi
2979: .PP
2980: Eliminating backing up among a list of keywords can also be
2981: done using a "catch-all" rule:
2982: .nf
2983:
2984: %%
2985: foo return TOK_KEYWORD;
2986: foobar return TOK_KEYWORD;
2987:
2988: [a-z]+ return TOK_ID;
2989:
2990: .fi
2991: This is usually the best solution when appropriate.
2992: .PP
2993: Backing up messages tend to cascade.
2994: With a complicated set of rules it's not uncommon to get hundreds
2995: of messages. If one can decipher them, though, it often
2996: only takes a dozen or so rules to eliminate the backing up (though
2997: it's easy to make a mistake and have an error rule accidentally match
2998: a valid token. A possible future
2999: .I flex
3000: feature will be to automatically add rules to eliminate backing up).
3001: .PP
3002: It's important to keep in mind that you gain the benefits of eliminating
3003: backing up only if you eliminate
3004: .I every
3005: instance of backing up. Leaving just one means you gain nothing.
3006: .PP
3007: .I Variable
3008: trailing context (where both the leading and trailing parts do not have
3009: a fixed length) entails almost the same performance loss as
3010: .B REJECT
3011: (i.e., substantial). So when possible a rule like:
3012: .nf
3013:
3014: %%
3015: mouse|rat/(cat|dog) run();
3016:
3017: .fi
3018: is better written:
3019: .nf
3020:
3021: %%
3022: mouse/cat|dog run();
3023: rat/cat|dog run();
3024:
3025: .fi
3026: or as
3027: .nf
3028:
3029: %%
3030: mouse|rat/cat run();
3031: mouse|rat/dog run();
3032:
3033: .fi
3034: Note that here the special '|' action does
3035: .I not
3036: provide any savings, and can even make things worse (see
3037: Deficiencies / Bugs below).
3038: .LP
3039: Another area where the user can increase a scanner's performance
3040: (and one that's easier to implement) arises from the fact that
3041: the longer the tokens matched, the faster the scanner will run.
3042: This is because with long tokens the processing of most input
3043: characters takes place in the (short) inner scanning loop, and
3044: does not often have to go through the additional work of setting up
3045: the scanning environment (e.g.,
3046: .B yytext)
3047: for the action. Recall the scanner for C comments:
3048: .nf
3049:
3050: %x comment
3051: %%
3052: int line_num = 1;
3053:
3054: "/*" BEGIN(comment);
3055:
3056: <comment>[^*\\n]*
3057: <comment>"*"+[^*/\\n]*
3058: <comment>\\n ++line_num;
3059: <comment>"*"+"/" BEGIN(INITIAL);
3060:
3061: .fi
3062: This could be sped up by writing it as:
3063: .nf
3064:
3065: %x comment
3066: %%
3067: int line_num = 1;
3068:
3069: "/*" BEGIN(comment);
3070:
3071: <comment>[^*\\n]*
3072: <comment>[^*\\n]*\\n ++line_num;
3073: <comment>"*"+[^*/\\n]*
3074: <comment>"*"+[^*/\\n]*\\n ++line_num;
3075: <comment>"*"+"/" BEGIN(INITIAL);
3076:
3077: .fi
3078: Now instead of each newline requiring the processing of another
3079: action, recognizing the newlines is "distributed" over the other rules
3080: to keep the matched text as long as possible. Note that
3081: .I adding
3082: rules does
3083: .I not
3084: slow down the scanner! The speed of the scanner is independent
3085: of the number of rules or (modulo the considerations given at the
3086: beginning of this section) how complicated the rules are with
3087: regard to operators such as '*' and '|'.
3088: .PP
3089: A final example in speeding up a scanner: suppose you want to scan
3090: through a file containing identifiers and keywords, one per line
3091: and with no other extraneous characters, and recognize all the
3092: keywords. A natural first approach is:
3093: .nf
3094:
3095: %%
3096: asm |
3097: auto |
3098: break |
3099: ... etc ...
3100: volatile |
3101: while /* it's a keyword */
3102:
3103: .|\\n /* it's not a keyword */
3104:
3105: .fi
3106: To eliminate the back-tracking, introduce a catch-all rule:
3107: .nf
3108:
3109: %%
3110: asm |
3111: auto |
3112: break |
3113: ... etc ...
3114: volatile |
3115: while /* it's a keyword */
3116:
3117: [a-z]+ |
3118: .|\\n /* it's not a keyword */
3119:
3120: .fi
3121: Now, if it's guaranteed that there's exactly one word per line,
3122: then we can reduce the total number of matches by a half by
3123: merging in the recognition of newlines with that of the other
3124: tokens:
3125: .nf
3126:
3127: %%
3128: asm\\n |
3129: auto\\n |
3130: break\\n |
3131: ... etc ...
3132: volatile\\n |
3133: while\\n /* it's a keyword */
3134:
3135: [a-z]+\\n |
3136: .|\\n /* it's not a keyword */
3137:
3138: .fi
3139: One has to be careful here, as we have now reintroduced backing up
3140: into the scanner. In particular, while
3141: .I we
3142: know that there will never be any characters in the input stream
3143: other than letters or newlines,
3144: .I flex
3145: can't figure this out, and it will plan for possibly needing to back up
3146: when it has scanned a token like "auto" and then the next character
3147: is something other than a newline or a letter. Previously it would
3148: then just match the "auto" rule and be done, but now it has no "auto"
1.10 deraadt 3149: rule, only an "auto\\n" rule. To eliminate the possibility of backing up,
1.1 deraadt 3150: we could either duplicate all rules but without final newlines, or,
3151: since we never expect to encounter such an input and therefore don't
3152: how it's classified, we can introduce one more catch-all rule, this
3153: one which doesn't include a newline:
3154: .nf
3155:
3156: %%
3157: asm\\n |
3158: auto\\n |
3159: break\\n |
3160: ... etc ...
3161: volatile\\n |
3162: while\\n /* it's a keyword */
3163:
3164: [a-z]+\\n |
3165: [a-z]+ |
3166: .|\\n /* it's not a keyword */
3167:
3168: .fi
3169: Compiled with
3170: .B \-Cf,
3171: this is about as fast as one can get a
1.7 aaron 3172: .I flex
1.1 deraadt 3173: scanner to go for this particular problem.
3174: .PP
3175: A final note:
3176: .I flex
3177: is slow when matching NUL's, particularly when a token contains
3178: multiple NUL's.
3179: It's best to write rules which match
3180: .I short
3181: amounts of text if it's anticipated that the text will often include NUL's.
3182: .PP
3183: Another final note regarding performance: as mentioned above in the section
3184: How the Input is Matched, dynamically resizing
3185: .B yytext
3186: to accommodate huge tokens is a slow process because it presently requires that
3187: the (huge) token be rescanned from the beginning. Thus if performance is
3188: vital, you should attempt to match "large" quantities of text but not
3189: "huge" quantities, where the cutoff between the two is at about 8K
3190: characters/token.
3191: .SH GENERATING C++ SCANNERS
3192: .I flex
3193: provides two different ways to generate scanners for use with C++. The
3194: first way is to simply compile a scanner generated by
3195: .I flex
3196: using a C++ compiler instead of a C compiler. You should not encounter
1.10 deraadt 3197: any compilation errors (please report any you find to the email address
1.1 deraadt 3198: given in the Author section below). You can then use C++ code in your
3199: rule actions instead of C code. Note that the default input source for
3200: your scanner remains
3201: .I yyin,
3202: and default echoing is still done to
3203: .I yyout.
3204: Both of these remain
3205: .I FILE *
3206: variables and not C++
3207: .I streams.
3208: .PP
3209: You can also use
3210: .I flex
3211: to generate a C++ scanner class, using the
3212: .B \-+
3213: option (or, equivalently,
3214: .B %option c++),
3215: which is automatically specified if the name of the flex
3216: executable ends in a '+', such as
3217: .I flex++.
3218: When using this option, flex defaults to generating the scanner to the file
3219: .B lex.yy.cc
3220: instead of
3221: .B lex.yy.c.
3222: The generated scanner includes the header file
1.5 deraadt 3223: .I g++/FlexLexer.h,
1.1 deraadt 3224: which defines the interface to two C++ classes.
3225: .PP
3226: The first class,
3227: .B FlexLexer,
3228: provides an abstract base class defining the general scanner class
3229: interface. It provides the following member functions:
3230: .TP
3231: .B const char* YYText()
3232: returns the text of the most recently matched token, the equivalent of
3233: .B yytext.
3234: .TP
3235: .B int YYLeng()
3236: returns the length of the most recently matched token, the equivalent of
3237: .B yyleng.
3238: .TP
3239: .B int lineno() const
3240: returns the current input line number
3241: (see
3242: .B %option yylineno),
3243: or
3244: .B 1
3245: if
3246: .B %option yylineno
3247: was not used.
3248: .TP
3249: .B void set_debug( int flag )
3250: sets the debugging flag for the scanner, equivalent to assigning to
3251: .B yy_flex_debug
3252: (see the Options section above). Note that you must build the scanner
3253: using
3254: .B %option debug
3255: to include debugging information in it.
3256: .TP
3257: .B int debug() const
3258: returns the current setting of the debugging flag.
3259: .PP
3260: Also provided are member functions equivalent to
3261: .B yy_switch_to_buffer(),
3262: .B yy_create_buffer()
3263: (though the first argument is an
3264: .B istream*
3265: object pointer and not a
3266: .B FILE*),
3267: .B yy_flush_buffer(),
3268: .B yy_delete_buffer(),
3269: and
3270: .B yyrestart()
1.10 deraadt 3271: (again, the first argument is an
1.1 deraadt 3272: .B istream*
3273: object pointer).
3274: .PP
3275: The second class defined in
1.5 deraadt 3276: .I g++/FlexLexer.h
1.1 deraadt 3277: is
3278: .B yyFlexLexer,
3279: which is derived from
3280: .B FlexLexer.
3281: It defines the following additional member functions:
3282: .TP
3283: .B
3284: yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
3285: constructs a
3286: .B yyFlexLexer
3287: object using the given streams for input and output. If not specified,
3288: the streams default to
3289: .B cin
3290: and
3291: .B cout,
3292: respectively.
3293: .TP
3294: .B virtual int yylex()
1.10 deraadt 3295: performs the same role as
1.1 deraadt 3296: .B yylex()
3297: does for ordinary flex scanners: it scans the input stream, consuming
3298: tokens, until a rule's action returns a value. If you derive a subclass
3299: .B S
3300: from
3301: .B yyFlexLexer
3302: and want to access the member functions and variables of
3303: .B S
3304: inside
3305: .B yylex(),
3306: then you need to use
3307: .B %option yyclass="S"
3308: to inform
3309: .I flex
3310: that you will be using that subclass instead of
3311: .B yyFlexLexer.
3312: In this case, rather than generating
3313: .B yyFlexLexer::yylex(),
3314: .I flex
3315: generates
3316: .B S::yylex()
3317: (and also generates a dummy
3318: .B yyFlexLexer::yylex()
3319: that calls
3320: .B yyFlexLexer::LexerError()
3321: if called).
3322: .TP
3323: .B
3324: virtual void switch_streams(istream* new_in = 0,
3325: .B
3326: ostream* new_out = 0)
3327: reassigns
3328: .B yyin
3329: to
3330: .B new_in
3331: (if non-nil)
3332: and
3333: .B yyout
3334: to
3335: .B new_out
3336: (ditto), deleting the previous input buffer if
3337: .B yyin
3338: is reassigned.
3339: .TP
3340: .B
3341: int yylex( istream* new_in, ostream* new_out = 0 )
3342: first switches the input streams via
3343: .B switch_streams( new_in, new_out )
3344: and then returns the value of
3345: .B yylex().
3346: .PP
3347: In addition,
3348: .B yyFlexLexer
3349: defines the following protected virtual functions which you can redefine
3350: in derived classes to tailor the scanner:
3351: .TP
3352: .B
3353: virtual int LexerInput( char* buf, int max_size )
3354: reads up to
3355: .B max_size
3356: characters into
3357: .B buf
3358: and returns the number of characters read. To indicate end-of-input,
3359: return 0 characters. Note that "interactive" scanners (see the
3360: .B \-B
3361: and
3362: .B \-I
3363: flags) define the macro
3364: .B YY_INTERACTIVE.
3365: If you redefine
3366: .B LexerInput()
3367: and need to take different actions depending on whether or not
3368: the scanner might be scanning an interactive input source, you can
3369: test for the presence of this name via
3370: .B #ifdef.
3371: .TP
3372: .B
3373: virtual void LexerOutput( const char* buf, int size )
3374: writes out
3375: .B size
3376: characters from the buffer
3377: .B buf,
3378: which, while NUL-terminated, may also contain "internal" NUL's if
3379: the scanner's rules can match text with NUL's in them.
3380: .TP
3381: .B
3382: virtual void LexerError( const char* msg )
3383: reports a fatal error message. The default version of this function
3384: writes the message to the stream
3385: .B cerr
3386: and exits.
3387: .PP
3388: Note that a
3389: .B yyFlexLexer
3390: object contains its
3391: .I entire
3392: scanning state. Thus you can use such objects to create reentrant
3393: scanners. You can instantiate multiple instances of the same
3394: .B yyFlexLexer
3395: class, and you can also combine multiple C++ scanner classes together
3396: in the same program using the
3397: .B \-P
3398: option discussed above.
3399: .PP
3400: Finally, note that the
3401: .B %array
3402: feature is not available to C++ scanner classes; you must use
3403: .B %pointer
3404: (the default).
3405: .PP
3406: Here is an example of a simple C++ scanner:
3407: .nf
3408:
3409: // An example of using the flex C++ scanner class.
3410:
3411: %{
3412: int mylineno = 0;
3413: %}
3414:
3415: string \\"[^\\n"]+\\"
3416:
3417: ws [ \\t]+
3418:
3419: alpha [A-Za-z]
3420: dig [0-9]
3421: name ({alpha}|{dig}|\\$)({alpha}|{dig}|[_.\\-/$])*
3422: num1 [-+]?{dig}+\\.?([eE][-+]?{dig}+)?
3423: num2 [-+]?{dig}*\\.{dig}+([eE][-+]?{dig}+)?
3424: number {num1}|{num2}
3425:
3426: %%
3427:
3428: {ws} /* skip blanks and tabs */
3429:
3430: "/*" {
3431: int c;
3432:
3433: while((c = yyinput()) != 0)
3434: {
3435: if(c == '\\n')
3436: ++mylineno;
3437:
3438: else if(c == '*')
3439: {
3440: if((c = yyinput()) == '/')
3441: break;
3442: else
3443: unput(c);
3444: }
3445: }
3446: }
3447:
3448: {number} cout << "number " << YYText() << '\\n';
3449:
3450: \\n mylineno++;
3451:
3452: {name} cout << "name " << YYText() << '\\n';
3453:
3454: {string} cout << "string " << YYText() << '\\n';
3455:
3456: %%
3457:
3458: int main( int /* argc */, char** /* argv */ )
3459: {
3460: FlexLexer* lexer = new yyFlexLexer;
3461: while(lexer->yylex() != 0)
3462: ;
3463: return 0;
3464: }
3465: .fi
3466: If you want to create multiple (different) lexer classes, you use the
3467: .B \-P
3468: flag (or the
3469: .B prefix=
3470: option) to rename each
3471: .B yyFlexLexer
3472: to some other
3473: .B xxFlexLexer.
3474: You then can include
1.5 deraadt 3475: .B <g++/FlexLexer.h>
1.1 deraadt 3476: in your other sources once per lexer class, first renaming
3477: .B yyFlexLexer
3478: as follows:
3479: .nf
3480:
3481: #undef yyFlexLexer
3482: #define yyFlexLexer xxFlexLexer
1.5 deraadt 3483: #include <g++/FlexLexer.h>
1.1 deraadt 3484:
3485: #undef yyFlexLexer
3486: #define yyFlexLexer zzFlexLexer
1.5 deraadt 3487: #include <g++/FlexLexer.h>
1.1 deraadt 3488:
3489: .fi
3490: if, for example, you used
3491: .B %option prefix="xx"
3492: for one of your scanners and
3493: .B %option prefix="zz"
3494: for the other.
3495: .PP
3496: IMPORTANT: the present form of the scanning class is
3497: .I experimental
1.7 aaron 3498: and may change considerably between major releases.
1.1 deraadt 3499: .SH INCOMPATIBILITIES WITH LEX AND POSIX
3500: .I flex
3501: is a rewrite of the AT&T Unix
3502: .I lex
3503: tool (the two implementations do not share any code, though),
3504: with some extensions and incompatibilities, both of which
3505: are of concern to those who wish to write scanners acceptable
3506: to either implementation. Flex is fully compliant with the POSIX
3507: .I lex
3508: specification, except that when using
3509: .B %pointer
3510: (the default), a call to
3511: .B unput()
3512: destroys the contents of
3513: .B yytext,
3514: which is counter to the POSIX specification.
3515: .PP
3516: In this section we discuss all of the known areas of incompatibility
3517: between flex, AT&T lex, and the POSIX specification.
3518: .PP
3519: .I flex's
3520: .B \-l
3521: option turns on maximum compatibility with the original AT&T
3522: .I lex
3523: implementation, at the cost of a major loss in the generated scanner's
3524: performance. We note below which incompatibilities can be overcome
3525: using the
3526: .B \-l
3527: option.
3528: .PP
3529: .I flex
3530: is fully compatible with
3531: .I lex
3532: with the following exceptions:
3533: .IP -
3534: The undocumented
3535: .I lex
3536: scanner internal variable
3537: .B yylineno
3538: is not supported unless
3539: .B \-l
3540: or
3541: .B %option yylineno
3542: is used.
3543: .IP
3544: .B yylineno
3545: should be maintained on a per-buffer basis, rather than a per-scanner
3546: (single global variable) basis.
3547: .IP
3548: .B yylineno
3549: is not part of the POSIX specification.
3550: .IP -
3551: The
3552: .B input()
3553: routine is not redefinable, though it may be called to read characters
3554: following whatever has been matched by a rule. If
3555: .B input()
3556: encounters an end-of-file the normal
3557: .B yywrap()
3558: processing is done. A ``real'' end-of-file is returned by
3559: .B input()
3560: as
3561: .I EOF.
3562: .IP
3563: Input is instead controlled by defining the
3564: .B YY_INPUT
3565: macro.
3566: .IP
3567: The
3568: .I flex
3569: restriction that
3570: .B input()
3571: cannot be redefined is in accordance with the POSIX specification,
3572: which simply does not specify any way of controlling the
3573: scanner's input other than by making an initial assignment to
3574: .I yyin.
3575: .IP -
3576: The
3577: .B unput()
3578: routine is not redefinable. This restriction is in accordance with POSIX.
3579: .IP -
3580: .I flex
3581: scanners are not as reentrant as
3582: .I lex
3583: scanners. In particular, if you have an interactive scanner and
3584: an interrupt handler which long-jumps out of the scanner, and
3585: the scanner is subsequently called again, you may get the following
3586: message:
3587: .nf
3588:
3589: fatal flex scanner internal error--end of buffer missed
3590:
3591: .fi
3592: To reenter the scanner, first use
3593: .nf
3594:
3595: yyrestart( yyin );
3596:
3597: .fi
3598: Note that this call will throw away any buffered input; usually this
3599: isn't a problem with an interactive scanner.
3600: .IP
3601: Also note that flex C++ scanner classes
3602: .I are
3603: reentrant, so if using C++ is an option for you, you should use
3604: them instead. See "Generating C++ Scanners" above for details.
3605: .IP -
3606: .B output()
3607: is not supported.
3608: Output from the
3609: .B ECHO
3610: macro is done to the file-pointer
3611: .I yyout
3612: (default
3613: .I stdout).
3614: .IP
3615: .B output()
3616: is not part of the POSIX specification.
3617: .IP -
3618: .I lex
3619: does not support exclusive start conditions (%x), though they
3620: are in the POSIX specification.
3621: .IP -
3622: When definitions are expanded,
3623: .I flex
3624: encloses them in parentheses.
3625: With lex, the following:
3626: .nf
3627:
3628: NAME [A-Z][A-Z0-9]*
3629: %%
3630: foo{NAME}? printf( "Found it\\n" );
3631: %%
3632:
3633: .fi
3634: will not match the string "foo" because when the macro
3635: is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
3636: and the precedence is such that the '?' is associated with
3637: "[A-Z0-9]*". With
3638: .I flex,
3639: the rule will be expanded to
3640: "foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
3641: .IP
3642: Note that if the definition begins with
3643: .B ^
3644: or ends with
3645: .B $
3646: then it is
3647: .I not
3648: expanded with parentheses, to allow these operators to appear in
3649: definitions without losing their special meanings. But the
3650: .B <s>, /,
3651: and
3652: .B <<EOF>>
3653: operators cannot be used in a
3654: .I flex
3655: definition.
3656: .IP
3657: Using
3658: .B \-l
3659: results in the
3660: .I lex
3661: behavior of no parentheses around the definition.
3662: .IP
3663: The POSIX specification is that the definition be enclosed in parentheses.
3664: .IP -
3665: Some implementations of
3666: .I lex
3667: allow a rule's action to begin on a separate line, if the rule's pattern
3668: has trailing whitespace:
3669: .nf
3670:
3671: %%
3672: foo|bar<space here>
3673: { foobar_action(); }
3674:
3675: .fi
3676: .I flex
3677: does not support this feature.
3678: .IP -
3679: The
3680: .I lex
3681: .B %r
3682: (generate a Ratfor scanner) option is not supported. It is not part
3683: of the POSIX specification.
3684: .IP -
3685: After a call to
3686: .B unput(),
3687: .I yytext
3688: is undefined until the next token is matched, unless the scanner
3689: was built using
3690: .B %array.
3691: This is not the case with
3692: .I lex
3693: or the POSIX specification. The
3694: .B \-l
3695: option does away with this incompatibility.
3696: .IP -
3697: The precedence of the
3698: .B {}
3699: (numeric range) operator is different.
3700: .I lex
3701: interprets "abc{1,3}" as "match one, two, or
3702: three occurrences of 'abc'", whereas
3703: .I flex
3704: interprets it as "match 'ab'
3705: followed by one, two, or three occurrences of 'c'". The latter is
3706: in agreement with the POSIX specification.
3707: .IP -
3708: The precedence of the
3709: .B ^
3710: operator is different.
3711: .I lex
3712: interprets "^foo|bar" as "match either 'foo' at the beginning of a line,
3713: or 'bar' anywhere", whereas
3714: .I flex
3715: interprets it as "match either 'foo' or 'bar' if they come at the beginning
3716: of a line". The latter is in agreement with the POSIX specification.
3717: .IP -
3718: The special table-size declarations such as
3719: .B %a
3720: supported by
3721: .I lex
3722: are not required by
3723: .I flex
3724: scanners;
3725: .I flex
3726: ignores them.
3727: .IP -
3728: The name
3729: .bd
3730: FLEX_SCANNER
3731: is #define'd so scanners may be written for use with either
3732: .I flex
3733: or
3734: .I lex.
3735: Scanners also include
3736: .B YY_FLEX_MAJOR_VERSION
3737: and
3738: .B YY_FLEX_MINOR_VERSION
3739: indicating which version of
3740: .I flex
3741: generated the scanner
3742: (for example, for the 2.5 release, these defines would be 2 and 5
3743: respectively).
3744: .PP
3745: The following
3746: .I flex
3747: features are not included in
3748: .I lex
3749: or the POSIX specification:
3750: .nf
3751:
3752: C++ scanners
3753: %option
3754: start condition scopes
3755: start condition stacks
3756: interactive/non-interactive scanners
3757: yy_scan_string() and friends
3758: yyterminate()
3759: yy_set_interactive()
3760: yy_set_bol()
3761: YY_AT_BOL()
3762: <<EOF>>
3763: <*>
3764: YY_DECL
3765: YY_START
3766: YY_USER_ACTION
3767: YY_USER_INIT
3768: #line directives
3769: %{}'s around actions
3770: multiple actions on a line
3771:
3772: .fi
3773: plus almost all of the flex flags.
3774: The last feature in the list refers to the fact that with
3775: .I flex
3776: you can put multiple actions on the same line, separated with
3777: semi-colons, while with
3778: .I lex,
3779: the following
3780: .nf
3781:
3782: foo handle_foo(); ++num_foos_seen;
3783:
3784: .fi
3785: is (rather surprisingly) truncated to
3786: .nf
3787:
3788: foo handle_foo();
3789:
3790: .fi
3791: .I flex
3792: does not truncate the action. Actions that are not enclosed in
3793: braces are simply terminated at the end of the line.
3794: .SH DIAGNOSTICS
3795: .PP
3796: .I warning, rule cannot be matched
3797: indicates that the given rule
3798: cannot be matched because it follows other rules that will
3799: always match the same text as it. For
3800: example, in the following "foo" cannot be matched because it comes after
3801: an identifier "catch-all" rule:
3802: .nf
3803:
3804: [a-z]+ got_identifier();
3805: foo got_foo();
3806:
3807: .fi
3808: Using
3809: .B REJECT
3810: in a scanner suppresses this warning.
3811: .PP
3812: .I warning,
3813: .B \-s
3814: .I
3815: option given but default rule can be matched
3816: means that it is possible (perhaps only in a particular start condition)
3817: that the default rule (match any single character) is the only one
3818: that will match a particular input. Since
3819: .B \-s
3820: was given, presumably this is not intended.
3821: .PP
3822: .I reject_used_but_not_detected undefined
3823: or
3824: .I yymore_used_but_not_detected undefined -
3825: These errors can occur at compile time. They indicate that the
3826: scanner uses
3827: .B REJECT
3828: or
3829: .B yymore()
3830: but that
3831: .I flex
3832: failed to notice the fact, meaning that
3833: .I flex
3834: scanned the first two sections looking for occurrences of these actions
1.10 deraadt 3835: and failed to find any, but somehow you snuck some in (via an #include
1.1 deraadt 3836: file, for example). Use
3837: .B %option reject
3838: or
3839: .B %option yymore
3840: to indicate to flex that you really do use these features.
3841: .PP
3842: .I flex scanner jammed -
3843: a scanner compiled with
3844: .B \-s
3845: has encountered an input string which wasn't matched by
3846: any of its rules. This error can also occur due to internal problems.
3847: .PP
3848: .I token too large, exceeds YYLMAX -
3849: your scanner uses
3850: .B %array
3851: and one of its rules matched a string longer than the
3852: .B YYLMAX
3853: constant (8K bytes by default). You can increase the value by
3854: #define'ing
3855: .B YYLMAX
3856: in the definitions section of your
3857: .I flex
3858: input.
3859: .PP
3860: .I scanner requires \-8 flag to
3861: .I use the character 'x' -
3862: Your scanner specification includes recognizing the 8-bit character
3863: .I 'x'
3864: and you did not specify the \-8 flag, and your scanner defaulted to 7-bit
3865: because you used the
3866: .B \-Cf
3867: or
3868: .B \-CF
3869: table compression options. See the discussion of the
3870: .B \-7
3871: flag for details.
3872: .PP
3873: .I flex scanner push-back overflow -
3874: you used
3875: .B unput()
3876: to push back so much text that the scanner's buffer could not hold
3877: both the pushed-back text and the current token in
3878: .B yytext.
3879: Ideally the scanner should dynamically resize the buffer in this case, but at
3880: present it does not.
3881: .PP
3882: .I
3883: input buffer overflow, can't enlarge buffer because scanner uses REJECT -
3884: the scanner was working on matching an extremely large token and needed
3885: to expand the input buffer. This doesn't work with scanners that use
3886: .B
3887: REJECT.
3888: .PP
3889: .I
3890: fatal flex scanner internal error--end of buffer missed -
3891: This can occur in an scanner which is reentered after a long-jump
3892: has jumped out (or over) the scanner's activation frame. Before
3893: reentering the scanner, use:
3894: .nf
3895:
3896: yyrestart( yyin );
3897:
3898: .fi
3899: or, as noted above, switch to using the C++ scanner class.
3900: .PP
3901: .I too many start conditions in <> construct! -
3902: you listed more start conditions in a <> construct than exist (so
3903: you must have listed at least one of them twice).
3904: .SH FILES
3905: .TP
3906: .B \-lfl
3907: library with which scanners must be linked.
3908: .TP
3909: .I lex.yy.c
3910: generated scanner (called
3911: .I lexyy.c
3912: on some systems).
3913: .TP
3914: .I lex.yy.cc
3915: generated C++ scanner class, when using
3916: .B -+.
3917: .TP
1.5 deraadt 3918: .I <g++/FlexLexer.h>
1.1 deraadt 3919: header file defining the C++ scanner base class,
3920: .B FlexLexer,
3921: and its derived class,
3922: .B yyFlexLexer.
3923: .TP
3924: .I flex.skl
3925: skeleton scanner. This file is only used when building flex, not when
3926: flex executes.
3927: .TP
3928: .I lex.backup
3929: backing-up information for
3930: .B \-b
3931: flag (called
3932: .I lex.bck
3933: on some systems).
3934: .SH DEFICIENCIES / BUGS
3935: .PP
3936: Some trailing context
3937: patterns cannot be properly matched and generate
3938: warning messages ("dangerous trailing context"). These are
3939: patterns where the ending of the
3940: first part of the rule matches the beginning of the second
3941: part, such as "zx*/xy*", where the 'x*' matches the 'x' at
3942: the beginning of the trailing context. (Note that the POSIX draft
3943: states that the text matched by such patterns is undefined.)
3944: .PP
3945: For some trailing context rules, parts which are actually fixed-length are
1.3 deraadt 3946: not recognized as such, leading to the above mentioned performance loss.
1.1 deraadt 3947: In particular, parts using '|' or {n} (such as "foo{3}") are always
3948: considered variable-length.
3949: .PP
3950: Combining trailing context with the special '|' action can result in
3951: .I fixed
3952: trailing context being turned into the more expensive
3953: .I variable
3954: trailing context. For example, in the following:
3955: .nf
3956:
3957: %%
3958: abc |
3959: xyz/def
3960:
3961: .fi
3962: .PP
3963: Use of
3964: .B unput()
3965: invalidates yytext and yyleng, unless the
3966: .B %array
3967: directive
3968: or the
3969: .B \-l
3970: option has been used.
3971: .PP
3972: Pattern-matching of NUL's is substantially slower than matching other
3973: characters.
3974: .PP
3975: Dynamic resizing of the input buffer is slow, as it entails rescanning
3976: all the text matched so far by the current (generally huge) token.
3977: .PP
3978: Due to both buffering of input and read-ahead, you cannot intermix
3979: calls to <stdio.h> routines, such as, for example,
3980: .B getchar(),
3981: with
3982: .I flex
3983: rules and expect it to work. Call
3984: .B input()
3985: instead.
3986: .PP
3987: The total table entries listed by the
3988: .B \-v
3989: flag excludes the number of table entries needed to determine
3990: what rule has been matched. The number of entries is equal
3991: to the number of DFA states if the scanner does not use
3992: .B REJECT,
3993: and somewhat greater than the number of states if it does.
3994: .PP
3995: .B REJECT
3996: cannot be used with the
3997: .B \-f
3998: or
3999: .B \-F
4000: options.
4001: .PP
4002: The
4003: .I flex
4004: internal algorithms need documentation.
4005: .SH SEE ALSO
4006: .PP
4007: lex(1), yacc(1), sed(1), awk(1).
4008: .PP
4009: John Levine, Tony Mason, and Doug Brown,
4010: .I Lex & Yacc,
4011: O'Reilly and Associates. Be sure to get the 2nd edition.
4012: .PP
4013: M. E. Lesk and E. Schmidt,
4014: .I LEX \- Lexical Analyzer Generator
4015: .PP
4016: Alfred Aho, Ravi Sethi and Jeffrey Ullman,
4017: .I Compilers: Principles, Techniques and Tools,
4018: Addison-Wesley (1986). Describes the pattern-matching techniques used by
4019: .I flex
4020: (deterministic finite automata).
4021: .SH AUTHOR
4022: Vern Paxson, with the help of many ideas and much inspiration from
4023: Van Jacobson. Original version by Jef Poskanzer. The fast table
4024: representation is a partial implementation of a design done by Van
4025: Jacobson. The implementation was done by Kevin Gong and Vern Paxson.
4026: .PP
4027: Thanks to the many
4028: .I flex
4029: beta-testers, feedbackers, and contributors, especially Francois Pinard,
4030: Casey Leedom,
4031: Robert Abramovitz,
4032: Stan Adermann, Terry Allen, David Barker-Plummer, John Basrai,
4033: Neal Becker, Nelson H.F. Beebe, benson@odi.com,
4034: Karl Berry, Peter A. Bigot, Simon Blanchard,
4035: Keith Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick Christopher,
4036: Brian Clapper, J.T. Conklin,
4037: Jason Coughlin, Bill Cox, Nick Cropper, Dave Curtis, Scott David
1.11 deraadt 4038: Daniels, Chris G. Demetriou, Theo de Raadt,
1.1 deraadt 4039: Mike Donahue, Chuck Doucette, Tom Epperly, Leo Eskin,
4040: Chris Faylor, Chris Flatters, Jon Forrest, Jeffrey Friedl,
4041: Joe Gayda, Kaveh R. Ghazi, Wolfgang Glunz,
4042: Eric Goldman, Christopher M. Gould, Ulrich Grepel, Peer Griebel,
4043: Jan Hajic, Charles Hemphill, NORO Hideo,
4044: Jarkko Hietaniemi, Scott Hofmann,
4045: Jeff Honig, Dana Hudes, Eric Hughes, John Interrante,
4046: Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones,
4047: Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane,
4048: Amir Katz, ken@ken.hilco.com, Kevin B. Kenny,
4049: Steve Kirsch, Winfried Koenig, Marq Kole, Ronald Lamprecht,
4050: Greg Lee, Rohan Lenard, Craig Leres, John Levine, Steve Liddle,
4051: David Loffredo, Mike Long,
4052: Mohamed el Lozy, Brian Madsen, Malte, Joe Marshall,
4053: Bengt Martensson, Chris Metcalf,
4054: Luke Mewburn, Jim Meyering, R. Alexander Milowski, Erik Naggum,
4055: G.T. Nicol, Landon Noll, James Nordby, Marc Nozell,
4056: Richard Ohnemus, Karsten Pahnke,
4057: Sven Panne, Roland Pesch, Walter Pelissero, Gaumond
4058: Pierre, Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha,
4059: Frederic Raimbault, Pat Rankin, Rick Richardson,
4060: Kevin Rodgers, Kai Uwe Rommel, Jim Roskind, Alberto Santini,
4061: Andreas Scherer, Darrell Schiebel, Raf Schietekat,
4062: Doug Schmidt, Philippe Schnoebelen, Andreas Schwab,
4063: Larry Schwimmer, Alex Siegel, Eckehard Stolz, Jan-Erik Strvmquist,
4064: Mike Stump, Paul Stuart, Dave Tallman, Ian Lance Taylor,
4065: Chris Thewalt, Richard M. Timoney, Jodi Tsai,
4066: Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams, Ken
4067: Yap, Ron Zellar, Nathan Zelle, David Zuhn,
4068: and those whose names have slipped my marginal
4069: mail-archiving skills but whose contributions are appreciated all the
4070: same.
4071: .PP
4072: Thanks to Keith Bostic, Jon Forrest, Noah Friedman,
4073: John Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T.
4074: Nicol, Francois Pinard, Rich Salz, and Richard Stallman for help with various
4075: distribution headaches.
4076: .PP
4077: Thanks to Esmond Pitt and Earle Horton for 8-bit character support; to
4078: Benson Margulies and Fred Burke for C++ support; to Kent Williams and Tom
4079: Epperly for C++ class support; to Ove Ewerlid for support of NUL's; and to
4080: Eric Hughes for support of multiple buffers.
4081: .PP
4082: This work was primarily done when I was with the Real Time Systems Group
4083: at the Lawrence Berkeley Laboratory in Berkeley, CA. Many thanks to all there
4084: for the support I received.
4085: .PP
4086: Send comments to vern@ee.lbl.gov.