Annotation of src/usr.bin/lex/flex.1, Revision 1.13
1.13 ! millert 1: .\" $OpenBSD: flex.1,v 1.12 2003/02/18 07:43:36 jmc Exp $
1.12 jmc 2: .\"
3: .\" Copyright (c) 1990 The Regents of the University of California.
4: .\" All rights reserved.
1.2 deraadt 5: .\"
1.12 jmc 6: .\" This code is derived from software contributed to Berkeley by
7: .\" Vern Paxson.
8: .\"
9: .\" The United States Government has rights in this work pursuant
10: .\" to contract no. DE-AC03-76SF00098 between the United States
11: .\" Department of Energy and the University of California.
12: .\"
13: .\" Redistribution and use in source and binary forms, with or without
1.13 ! millert 14: .\" modification, are permitted provided that the following conditions
! 15: .\" are met:
! 16: .\"
! 17: .\" 1. Redistributions of source code must retain the above copyright
! 18: .\" notice, this list of conditions and the following disclaimer.
! 19: .\" 2. Redistributions in binary form must reproduce the above copyright
! 20: .\" notice, this list of conditions and the following disclaimer in the
! 21: .\" documentation and/or other materials provided with the distribution.
! 22: .\"
! 23: .\" Neither the name of the University nor the names of its contributors
! 24: .\" may be used to endorse or promote products derived from this software
! 25: .\" without specific prior written permission.
! 26: .\"
! 27: .\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
! 28: .\" IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
! 29: .\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
! 30: .\" PURPOSE.
1.12 jmc 31: .\"
1.1 deraadt 32: .TH FLEX 1 "April 1995" "Version 2.5"
33: .SH NAME
34: flex \- fast lexical analyzer generator
35: .SH SYNOPSIS
36: .B flex
37: .B [\-bcdfhilnpstvwBFILTV78+? \-C[aefFmr] \-ooutput \-Pprefix \-Sskeleton]
38: .B [\-\-help \-\-version]
39: .I [filename ...]
40: .SH OVERVIEW
41: This manual describes
42: .I flex,
43: a tool for generating programs that perform pattern-matching on text. The
44: manual includes both tutorial and reference sections:
45: .nf
46:
47: Description
48: a brief overview of the tool
49:
50: Some Simple Examples
51:
52: Format Of The Input File
53:
54: Patterns
55: the extended regular expressions used by flex
56:
57: How The Input Is Matched
58: the rules for determining what has been matched
59:
60: Actions
61: how to specify what to do when a pattern is matched
62:
63: The Generated Scanner
64: details regarding the scanner that flex produces;
65: how to control the input source
66:
67: Start Conditions
68: introducing context into your scanners, and
69: managing "mini-scanners"
70:
71: Multiple Input Buffers
72: how to manipulate multiple input sources; how to
73: scan from strings instead of files
74:
75: End-of-file Rules
76: special rules for matching the end of the input
77:
78: Miscellaneous Macros
79: a summary of macros available to the actions
80:
81: Values Available To The User
82: a summary of values available to the actions
83:
84: Interfacing With Yacc
85: connecting flex scanners together with yacc parsers
86:
87: Options
88: flex command-line options, and the "%option"
89: directive
90:
91: Performance Considerations
92: how to make your scanner go as fast as possible
93:
94: Generating C++ Scanners
95: the (experimental) facility for generating C++
96: scanner classes
97:
98: Incompatibilities With Lex And POSIX
99: how flex differs from AT&T lex and the POSIX lex
100: standard
101:
102: Diagnostics
103: those error messages produced by flex (or scanners
104: it generates) whose meanings might not be apparent
105:
106: Files
107: files used by flex
108:
109: Deficiencies / Bugs
110: known problems with flex
111:
112: See Also
113: other documentation, related tools
114:
115: Author
116: includes contact information
117:
118: .fi
119: .SH DESCRIPTION
120: .I flex
121: is a tool for generating
122: .I scanners:
1.9 millert 123: programs which recognize lexical patterns in text.
1.1 deraadt 124: .I flex
125: reads
126: the given input files, or its standard input if no file names are given,
127: for a description of a scanner to generate. The description is in
128: the form of pairs
129: of regular expressions and C code, called
130: .I rules. flex
131: generates as output a C source file,
132: .B lex.yy.c,
133: which defines a routine
134: .B yylex().
135: This file is compiled and linked with the
136: .B \-lfl
137: library to produce an executable. When the executable is run,
138: it analyzes its input for occurrences
139: of the regular expressions. Whenever it finds one, it executes
140: the corresponding C code.
141: .SH SOME SIMPLE EXAMPLES
142: .PP
143: First some simple examples to get the flavor of how one uses
144: .I flex.
145: The following
146: .I flex
147: input specifies a scanner which whenever it encounters the string
148: "username" will replace it with the user's login name:
149: .nf
150:
151: %%
152: username printf( "%s", getlogin() );
153:
154: .fi
155: By default, any text not matched by a
156: .I flex
157: scanner
158: is copied to the output, so the net effect of this scanner is
159: to copy its input file to its output with each occurrence
160: of "username" expanded.
161: In this input, there is just one rule. "username" is the
162: .I pattern
163: and the "printf" is the
164: .I action.
165: The "%%" marks the beginning of the rules.
166: .PP
167: Here's another simple example:
168: .nf
169:
170: int num_lines = 0, num_chars = 0;
171:
172: %%
173: \\n ++num_lines; ++num_chars;
174: . ++num_chars;
175:
176: %%
177: main()
178: {
179: yylex();
180: printf( "# of lines = %d, # of chars = %d\\n",
181: num_lines, num_chars );
182: }
183:
184: .fi
185: This scanner counts the number of characters and the number
186: of lines in its input (it produces no output other than the
187: final report on the counts). The first line
188: declares two globals, "num_lines" and "num_chars", which are accessible
189: both inside
190: .B yylex()
191: and in the
192: .B main()
193: routine declared after the second "%%". There are two rules, one
194: which matches a newline ("\\n") and increments both the line count and
195: the character count, and one which matches any character other than
196: a newline (indicated by the "." regular expression).
197: .PP
198: A somewhat more complicated example:
199: .nf
200:
201: /* scanner for a toy Pascal-like language */
202:
203: %{
204: /* need this for the call to atof() below */
205: #include <math.h>
206: %}
207:
208: DIGIT [0-9]
209: ID [a-z][a-z0-9]*
210:
211: %%
212:
213: {DIGIT}+ {
214: printf( "An integer: %s (%d)\\n", yytext,
215: atoi( yytext ) );
216: }
217:
218: {DIGIT}+"."{DIGIT}* {
219: printf( "A float: %s (%g)\\n", yytext,
220: atof( yytext ) );
221: }
222:
223: if|then|begin|end|procedure|function {
224: printf( "A keyword: %s\\n", yytext );
225: }
226:
227: {ID} printf( "An identifier: %s\\n", yytext );
228:
229: "+"|"-"|"*"|"/" printf( "An operator: %s\\n", yytext );
230:
231: "{"[^}\\n]*"}" /* eat up one-line comments */
232:
233: [ \\t\\n]+ /* eat up whitespace */
234:
235: . printf( "Unrecognized character: %s\\n", yytext );
236:
237: %%
238:
239: main( argc, argv )
240: int argc;
241: char **argv;
242: {
243: ++argv, --argc; /* skip over program name */
244: if ( argc > 0 )
245: yyin = fopen( argv[0], "r" );
246: else
247: yyin = stdin;
1.7 aaron 248:
1.1 deraadt 249: yylex();
250: }
251:
252: .fi
253: This is the beginnings of a simple scanner for a language like
254: Pascal. It identifies different types of
255: .I tokens
256: and reports on what it has seen.
257: .PP
258: The details of this example will be explained in the following
259: sections.
260: .SH FORMAT OF THE INPUT FILE
261: The
262: .I flex
263: input file consists of three sections, separated by a line with just
264: .B %%
265: in it:
266: .nf
267:
268: definitions
269: %%
270: rules
271: %%
272: user code
273:
274: .fi
275: The
276: .I definitions
277: section contains declarations of simple
278: .I name
279: definitions to simplify the scanner specification, and declarations of
280: .I start conditions,
281: which are explained in a later section.
282: .PP
283: Name definitions have the form:
284: .nf
285:
286: name definition
287:
288: .fi
289: The "name" is a word beginning with a letter or an underscore ('_')
290: followed by zero or more letters, digits, '_', or '-' (dash).
1.8 aaron 291: The definition is taken to begin at the first non-whitespace character
1.1 deraadt 292: following the name and continuing to the end of the line.
293: The definition can subsequently be referred to using "{name}", which
294: will expand to "(definition)". For example,
295: .nf
296:
297: DIGIT [0-9]
298: ID [a-z][a-z0-9]*
299:
300: .fi
301: defines "DIGIT" to be a regular expression which matches a
302: single digit, and
303: "ID" to be a regular expression which matches a letter
304: followed by zero-or-more letters-or-digits.
305: A subsequent reference to
306: .nf
307:
308: {DIGIT}+"."{DIGIT}*
309:
310: .fi
311: is identical to
312: .nf
313:
314: ([0-9])+"."([0-9])*
315:
316: .fi
317: and matches one-or-more digits followed by a '.' followed
318: by zero-or-more digits.
319: .PP
320: The
321: .I rules
322: section of the
323: .I flex
324: input contains a series of rules of the form:
325: .nf
326:
327: pattern action
328:
329: .fi
330: where the pattern must be unindented and the action must begin
331: on the same line.
332: .PP
333: See below for a further description of patterns and actions.
334: .PP
335: Finally, the user code section is simply copied to
336: .B lex.yy.c
337: verbatim.
338: It is used for companion routines which call or are called
339: by the scanner. The presence of this section is optional;
340: if it is missing, the second
341: .B %%
342: in the input file may be skipped, too.
343: .PP
344: In the definitions and rules sections, any
345: .I indented
346: text or text enclosed in
347: .B %{
348: and
349: .B %}
350: is copied verbatim to the output (with the %{}'s removed).
351: The %{}'s must appear unindented on lines by themselves.
352: .PP
353: In the rules section,
354: any indented or %{} text appearing before the
355: first rule may be used to declare variables
356: which are local to the scanning routine and (after the declarations)
357: code which is to be executed whenever the scanning routine is entered.
358: Other indented or %{} text in the rule section is still copied to the output,
359: but its meaning is not well-defined and it may well cause compile-time
360: errors (this feature is present for
361: .I POSIX
362: compliance; see below for other such features).
363: .PP
364: In the definitions section (but not in the rules section),
365: an unindented comment (i.e., a line
366: beginning with "/*") is also copied verbatim to the output up
367: to the next "*/".
368: .SH PATTERNS
369: The patterns in the input are written using an extended set of regular
370: expressions. These are:
371: .nf
372:
373: x match the character 'x'
374: . any character (byte) except newline
375: [xyz] a "character class"; in this case, the pattern
376: matches either an 'x', a 'y', or a 'z'
377: [abj-oZ] a "character class" with a range in it; matches
378: an 'a', a 'b', any letter from 'j' through 'o',
379: or a 'Z'
380: [^A-Z] a "negated character class", i.e., any character
381: but those in the class. In this case, any
382: character EXCEPT an uppercase letter.
383: [^A-Z\\n] any character EXCEPT an uppercase letter or
384: a newline
385: r* zero or more r's, where r is any regular expression
386: r+ one or more r's
387: r? zero or one r's (that is, "an optional r")
388: r{2,5} anywhere from two to five r's
389: r{2,} two or more r's
390: r{4} exactly 4 r's
391: {name} the expansion of the "name" definition
392: (see above)
393: "[xyz]\\"foo"
394: the literal string: [xyz]"foo
395: \\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
396: then the ANSI-C interpretation of \\x.
397: Otherwise, a literal 'X' (used to escape
398: operators such as '*')
399: \\0 a NUL character (ASCII code 0)
400: \\123 the character with octal value 123
401: \\x2a the character with hexadecimal value 2a
402: (r) match an r; parentheses are used to override
403: precedence (see below)
404:
405:
406: rs the regular expression r followed by the
407: regular expression s; called "concatenation"
408:
409:
410: r|s either an r or an s
411:
412:
413: r/s an r but only if it is followed by an s. The
414: text matched by s is included when determining
415: whether this rule is the "longest match",
416: but is then returned to the input before
417: the action is executed. So the action only
418: sees the text matched by r. This type
419: of pattern is called trailing context".
420: (There are some combinations of r/s that flex
421: cannot match correctly; see notes in the
422: Deficiencies / Bugs section below regarding
423: "dangerous trailing context".)
424: ^r an r, but only at the beginning of a line (i.e.,
1.10 deraadt 425: just starting to scan, or right after a
1.1 deraadt 426: newline has been scanned).
427: r$ an r, but only at the end of a line (i.e., just
428: before a newline). Equivalent to "r/\\n".
429:
430: Note that flex's notion of "newline" is exactly
431: whatever the C compiler used to compile flex
432: interprets '\\n' as; in particular, on some DOS
433: systems you must either filter out \\r's in the
434: input yourself, or explicitly use r/\\r\\n for "r$".
435:
436:
437: <s>r an r, but only in start condition s (see
438: below for discussion of start conditions)
439: <s1,s2,s3>r
440: same, but in any of start conditions s1,
441: s2, or s3
442: <*>r an r in any start condition, even an exclusive one.
443:
444:
445: <<EOF>> an end-of-file
446: <s1,s2><<EOF>>
447: an end-of-file when in start condition s1 or s2
448:
449: .fi
450: Note that inside of a character class, all regular expression operators
451: lose their special meaning except escape ('\\') and the character class
452: operators, '-', ']', and, at the beginning of the class, '^'.
453: .PP
454: The regular expressions listed above are grouped according to
455: precedence, from highest precedence at the top to lowest at the bottom.
456: Those grouped together have equal precedence. For example,
457: .nf
458:
459: foo|bar*
460:
461: .fi
462: is the same as
463: .nf
464:
465: (foo)|(ba(r*))
466:
467: .fi
468: since the '*' operator has higher precedence than concatenation,
469: and concatenation higher than alternation ('|'). This pattern
470: therefore matches
471: .I either
472: the string "foo"
473: .I or
474: the string "ba" followed by zero-or-more r's.
475: To match "foo" or zero-or-more "bar"'s, use:
476: .nf
477:
478: foo|(bar)*
479:
480: .fi
481: and to match zero-or-more "foo"'s-or-"bar"'s:
482: .nf
483:
484: (foo|bar)*
485:
486: .fi
487: .PP
488: In addition to characters and ranges of characters, character classes
489: can also contain character class
490: .I expressions.
491: These are expressions enclosed inside
492: .B [:
493: and
494: .B :]
495: delimiters (which themselves must appear between the '[' and ']' of the
496: character class; other elements may occur inside the character class, too).
497: The valid expressions are:
498: .nf
499:
500: [:alnum:] [:alpha:] [:blank:]
501: [:cntrl:] [:digit:] [:graph:]
502: [:lower:] [:print:] [:punct:]
503: [:space:] [:upper:] [:xdigit:]
504:
505: .fi
506: These expressions all designate a set of characters equivalent to
507: the corresponding standard C
508: .B isXXX
509: function. For example,
510: .B [:alnum:]
511: designates those characters for which
512: .B isalnum()
513: returns true - i.e., any alphabetic or numeric.
514: Some systems don't provide
515: .B isblank(),
516: so flex defines
517: .B [:blank:]
518: as a blank or a tab.
519: .PP
520: For example, the following character classes are all equivalent:
521: .nf
522:
523: [[:alnum:]]
1.4 deraadt 524: [[:alpha:][:digit:]]
1.1 deraadt 525: [[:alpha:]0-9]
526: [a-zA-Z0-9]
527:
528: .fi
529: If your scanner is case-insensitive (the
530: .B \-i
531: flag), then
532: .B [:upper:]
533: and
534: .B [:lower:]
535: are equivalent to
536: .B [:alpha:].
537: .PP
538: Some notes on patterns:
539: .IP -
540: A negated character class such as the example "[^A-Z]"
541: above
542: .I will match a newline
543: unless "\\n" (or an equivalent escape sequence) is one of the
544: characters explicitly present in the negated character class
545: (e.g., "[^A-Z\\n]"). This is unlike how many other regular
546: expression tools treat negated character classes, but unfortunately
547: the inconsistency is historically entrenched.
548: Matching newlines means that a pattern like [^"]* can match the entire
549: input unless there's another quote in the input.
550: .IP -
551: A rule can have at most one instance of trailing context (the '/' operator
552: or the '$' operator). The start condition, '^', and "<<EOF>>" patterns
553: can only occur at the beginning of a pattern, and, as well as with '/' and '$',
554: cannot be grouped inside parentheses. A '^' which does not occur at
555: the beginning of a rule or a '$' which does not occur at the end of
556: a rule loses its special properties and is treated as a normal character.
557: .IP
558: The following are illegal:
559: .nf
560:
561: foo/bar$
562: <sc1>foo<sc2>bar
563:
564: .fi
565: Note that the first of these, can be written "foo/bar\\n".
566: .IP
567: The following will result in '$' or '^' being treated as a normal character:
568: .nf
569:
570: foo|(bar$)
571: foo|^bar
572:
573: .fi
574: If what's wanted is a "foo" or a bar-followed-by-a-newline, the following
575: could be used (the special '|' action is explained below):
576: .nf
577:
578: foo |
579: bar$ /* action goes here */
580:
581: .fi
582: A similar trick will work for matching a foo or a
583: bar-at-the-beginning-of-a-line.
584: .SH HOW THE INPUT IS MATCHED
585: When the generated scanner is run, it analyzes its input looking
586: for strings which match any of its patterns. If it finds more than
587: one match, it takes the one matching the most text (for trailing
588: context rules, this includes the length of the trailing part, even
589: though it will then be returned to the input). If it finds two
590: or more matches of the same length, the
591: rule listed first in the
592: .I flex
593: input file is chosen.
594: .PP
595: Once the match is determined, the text corresponding to the match
596: (called the
597: .I token)
598: is made available in the global character pointer
599: .B yytext,
600: and its length in the global integer
601: .B yyleng.
602: The
603: .I action
604: corresponding to the matched pattern is then executed (a more
605: detailed description of actions follows), and then the remaining
606: input is scanned for another match.
607: .PP
608: If no match is found, then the
609: .I default rule
610: is executed: the next character in the input is considered matched and
611: copied to the standard output. Thus, the simplest legal
612: .I flex
613: input is:
614: .nf
615:
616: %%
617:
618: .fi
619: which generates a scanner that simply copies its input (one character
620: at a time) to its output.
621: .PP
622: Note that
623: .B yytext
624: can be defined in two different ways: either as a character
625: .I pointer
626: or as a character
627: .I array.
628: You can control which definition
629: .I flex
630: uses by including one of the special directives
631: .B %pointer
632: or
633: .B %array
634: in the first (definitions) section of your flex input. The default is
635: .B %pointer,
636: unless you use the
637: .B -l
638: lex compatibility option, in which case
639: .B yytext
640: will be an array.
641: The advantage of using
642: .B %pointer
643: is substantially faster scanning and no buffer overflow when matching
644: very large tokens (unless you run out of dynamic memory). The disadvantage
645: is that you are restricted in how your actions can modify
646: .B yytext
647: (see the next section), and calls to the
648: .B unput()
1.10 deraadt 649: function destroy the present contents of
1.1 deraadt 650: .B yytext,
651: which can be a considerable porting headache when moving between different
652: .I lex
653: versions.
654: .PP
655: The advantage of
656: .B %array
657: is that you can then modify
658: .B yytext
659: to your heart's content, and calls to
660: .B unput()
661: do not destroy
662: .B yytext
663: (see below). Furthermore, existing
664: .I lex
665: programs sometimes access
666: .B yytext
667: externally using declarations of the form:
668: .nf
669: extern char yytext[];
670: .fi
671: This definition is erroneous when used with
672: .B %pointer,
673: but correct for
674: .B %array.
675: .PP
676: .B %array
677: defines
678: .B yytext
679: to be an array of
680: .B YYLMAX
681: characters, which defaults to a fairly large value. You can change
682: the size by simply #define'ing
683: .B YYLMAX
684: to a different value in the first section of your
685: .I flex
686: input. As mentioned above, with
687: .B %pointer
688: yytext grows dynamically to accommodate large tokens. While this means your
689: .B %pointer
690: scanner can accommodate very large tokens (such as matching entire blocks
691: of comments), bear in mind that each time the scanner must resize
692: .B yytext
693: it also must rescan the entire token from the beginning, so matching such
694: tokens can prove slow.
695: .B yytext
696: presently does
697: .I not
698: dynamically grow if a call to
699: .B unput()
700: results in too much text being pushed back; instead, a run-time error results.
701: .PP
702: Also note that you cannot use
703: .B %array
704: with C++ scanner classes
705: (the
706: .B c++
707: option; see below).
708: .SH ACTIONS
709: Each pattern in a rule has a corresponding action, which can be any
710: arbitrary C statement. The pattern ends at the first non-escaped
711: whitespace character; the remainder of the line is its action. If the
712: action is empty, then when the pattern is matched the input token
713: is simply discarded. For example, here is the specification for a program
714: which deletes all occurrences of "zap me" from its input:
715: .nf
716:
717: %%
718: "zap me"
719:
720: .fi
721: (It will copy all other characters in the input to the output since
722: they will be matched by the default rule.)
723: .PP
724: Here is a program which compresses multiple blanks and tabs down to
725: a single blank, and throws away whitespace found at the end of a line:
726: .nf
727:
728: %%
729: [ \\t]+ putchar( ' ' );
730: [ \\t]+$ /* ignore this token */
731:
732: .fi
733: .PP
734: If the action contains a '{', then the action spans till the balancing '}'
735: is found, and the action may cross multiple lines.
1.7 aaron 736: .I flex
1.1 deraadt 737: knows about C strings and comments and won't be fooled by braces found
738: within them, but also allows actions to begin with
739: .B %{
740: and will consider the action to be all the text up to the next
741: .B %}
742: (regardless of ordinary braces inside the action).
743: .PP
744: An action consisting solely of a vertical bar ('|') means "same as
745: the action for the next rule." See below for an illustration.
746: .PP
747: Actions can include arbitrary C code, including
748: .B return
749: statements to return a value to whatever routine called
750: .B yylex().
751: Each time
752: .B yylex()
753: is called it continues processing tokens from where it last left
754: off until it either reaches
755: the end of the file or executes a return.
756: .PP
757: Actions are free to modify
758: .B yytext
759: except for lengthening it (adding
760: characters to its end--these will overwrite later characters in the
761: input stream). This however does not apply when using
762: .B %array
763: (see above); in that case,
764: .B yytext
765: may be freely modified in any way.
766: .PP
767: Actions are free to modify
768: .B yyleng
769: except they should not do so if the action also includes use of
770: .B yymore()
771: (see below).
772: .PP
773: There are a number of special directives which can be included within
774: an action:
775: .IP -
776: .B ECHO
777: copies yytext to the scanner's output.
778: .IP -
779: .B BEGIN
780: followed by the name of a start condition places the scanner in the
781: corresponding start condition (see below).
782: .IP -
783: .B REJECT
784: directs the scanner to proceed on to the "second best" rule which matched the
785: input (or a prefix of the input). The rule is chosen as described
786: above in "How the Input is Matched", and
787: .B yytext
788: and
789: .B yyleng
790: set up appropriately.
791: It may either be one which matched as much text
792: as the originally chosen rule but came later in the
793: .I flex
794: input file, or one which matched less text.
795: For example, the following will both count the
796: words in the input and call the routine special() whenever "frob" is seen:
797: .nf
798:
799: int word_count = 0;
800: %%
801:
802: frob special(); REJECT;
803: [^ \\t\\n]+ ++word_count;
804:
805: .fi
806: Without the
807: .B REJECT,
808: any "frob"'s in the input would not be counted as words, since the
809: scanner normally executes only one action per token.
810: Multiple
811: .B REJECT's
812: are allowed, each one finding the next best choice to the currently
813: active rule. For example, when the following scanner scans the token
814: "abcd", it will write "abcdabcaba" to the output:
815: .nf
816:
817: %%
818: a |
819: ab |
820: abc |
821: abcd ECHO; REJECT;
822: .|\\n /* eat up any unmatched character */
823:
824: .fi
825: (The first three rules share the fourth's action since they use
826: the special '|' action.)
827: .B REJECT
828: is a particularly expensive feature in terms of scanner performance;
829: if it is used in
830: .I any
831: of the scanner's actions it will slow down
832: .I all
833: of the scanner's matching. Furthermore,
834: .B REJECT
835: cannot be used with the
836: .I -Cf
837: or
838: .I -CF
839: options (see below).
840: .IP
841: Note also that unlike the other special actions,
842: .B REJECT
843: is a
844: .I branch;
845: code immediately following it in the action will
846: .I not
847: be executed.
848: .IP -
849: .B yymore()
850: tells the scanner that the next time it matches a rule, the corresponding
851: token should be
852: .I appended
853: onto the current value of
854: .B yytext
855: rather than replacing it. For example, given the input "mega-kludge"
856: the following will write "mega-mega-kludge" to the output:
857: .nf
858:
859: %%
860: mega- ECHO; yymore();
861: kludge ECHO;
862:
863: .fi
864: First "mega-" is matched and echoed to the output. Then "kludge"
865: is matched, but the previous "mega-" is still hanging around at the
866: beginning of
867: .B yytext
868: so the
869: .B ECHO
870: for the "kludge" rule will actually write "mega-kludge".
871: .PP
872: Two notes regarding use of
873: .B yymore().
874: First,
875: .B yymore()
876: depends on the value of
877: .I yyleng
878: correctly reflecting the size of the current token, so you must not
879: modify
880: .I yyleng
881: if you are using
882: .B yymore().
883: Second, the presence of
884: .B yymore()
885: in the scanner's action entails a minor performance penalty in the
886: scanner's matching speed.
887: .IP -
888: .B yyless(n)
889: returns all but the first
890: .I n
891: characters of the current token back to the input stream, where they
892: will be rescanned when the scanner looks for the next match.
893: .B yytext
894: and
895: .B yyleng
896: are adjusted appropriately (e.g.,
897: .B yyleng
898: will now be equal to
899: .I n
900: ). For example, on the input "foobar" the following will write out
901: "foobarbar":
902: .nf
903:
904: %%
905: foobar ECHO; yyless(3);
906: [a-z]+ ECHO;
907:
908: .fi
909: An argument of 0 to
910: .B yyless
911: will cause the entire current input string to be scanned again. Unless you've
912: changed how the scanner will subsequently process its input (using
913: .B BEGIN,
914: for example), this will result in an endless loop.
915: .PP
916: Note that
917: .B yyless
918: is a macro and can only be used in the flex input file, not from
919: other source files.
920: .IP -
921: .B unput(c)
922: puts the character
923: .I c
924: back onto the input stream. It will be the next character scanned.
925: The following action will take the current token and cause it
926: to be rescanned enclosed in parentheses.
927: .nf
928:
929: {
930: int i;
931: /* Copy yytext because unput() trashes yytext */
932: char *yycopy = strdup( yytext );
933: unput( ')' );
934: for ( i = yyleng - 1; i >= 0; --i )
935: unput( yycopy[i] );
936: unput( '(' );
937: free( yycopy );
938: }
939:
940: .fi
941: Note that since each
942: .B unput()
943: puts the given character back at the
944: .I beginning
945: of the input stream, pushing back strings must be done back-to-front.
946: .PP
947: An important potential problem when using
948: .B unput()
949: is that if you are using
950: .B %pointer
951: (the default), a call to
952: .B unput()
953: .I destroys
954: the contents of
955: .I yytext,
956: starting with its rightmost character and devouring one character to
957: the left with each call. If you need the value of yytext preserved
958: after a call to
959: .B unput()
960: (as in the above example),
961: you must either first copy it elsewhere, or build your scanner using
962: .B %array
963: instead (see How The Input Is Matched).
964: .PP
965: Finally, note that you cannot put back
966: .B EOF
967: to attempt to mark the input stream with an end-of-file.
968: .IP -
969: .B input()
970: reads the next character from the input stream. For example,
971: the following is one way to eat up C comments:
972: .nf
973:
974: %%
975: "/*" {
976: register int c;
977:
978: for ( ; ; )
979: {
980: while ( (c = input()) != '*' &&
981: c != EOF )
982: ; /* eat up text of comment */
983:
984: if ( c == '*' )
985: {
986: while ( (c = input()) == '*' )
987: ;
988: if ( c == '/' )
989: break; /* found the end */
990: }
991:
992: if ( c == EOF )
993: {
994: error( "EOF in comment" );
995: break;
996: }
997: }
998: }
999:
1000: .fi
1001: (Note that if the scanner is compiled using
1002: .B C++,
1003: then
1004: .B input()
1005: is instead referred to as
1006: .B yyinput(),
1007: in order to avoid a name clash with the
1008: .B C++
1009: stream by the name of
1010: .I input.)
1011: .IP -
1012: .B YY_FLUSH_BUFFER
1013: flushes the scanner's internal buffer
1014: so that the next time the scanner attempts to match a token, it will
1015: first refill the buffer using
1016: .B YY_INPUT
1017: (see The Generated Scanner, below). This action is a special case
1018: of the more general
1019: .B yy_flush_buffer()
1020: function, described below in the section Multiple Input Buffers.
1021: .IP -
1022: .B yyterminate()
1023: can be used in lieu of a return statement in an action. It terminates
1024: the scanner and returns a 0 to the scanner's caller, indicating "all done".
1025: By default,
1026: .B yyterminate()
1027: is also called when an end-of-file is encountered. It is a macro and
1028: may be redefined.
1029: .SH THE GENERATED SCANNER
1030: The output of
1031: .I flex
1032: is the file
1033: .B lex.yy.c,
1034: which contains the scanning routine
1035: .B yylex(),
1036: a number of tables used by it for matching tokens, and a number
1037: of auxiliary routines and macros. By default,
1038: .B yylex()
1039: is declared as follows:
1040: .nf
1041:
1042: int yylex()
1043: {
1044: ... various definitions and the actions in here ...
1045: }
1046:
1047: .fi
1048: (If your environment supports function prototypes, then it will
1049: be "int yylex( void )".) This definition may be changed by defining
1050: the "YY_DECL" macro. For example, you could use:
1051: .nf
1052:
1053: #define YY_DECL float lexscan( a, b ) float a, b;
1054:
1055: .fi
1056: to give the scanning routine the name
1057: .I lexscan,
1058: returning a float, and taking two floats as arguments. Note that
1059: if you give arguments to the scanning routine using a
1060: K&R-style/non-prototyped function declaration, you must terminate
1061: the definition with a semi-colon (;).
1062: .PP
1063: Whenever
1064: .B yylex()
1065: is called, it scans tokens from the global input file
1066: .I yyin
1067: (which defaults to stdin). It continues until it either reaches
1068: an end-of-file (at which point it returns the value 0) or
1069: one of its actions executes a
1070: .I return
1071: statement.
1072: .PP
1073: If the scanner reaches an end-of-file, subsequent calls are undefined
1074: unless either
1075: .I yyin
1076: is pointed at a new input file (in which case scanning continues from
1077: that file), or
1078: .B yyrestart()
1079: is called.
1080: .B yyrestart()
1081: takes one argument, a
1082: .B FILE *
1083: pointer (which can be nil, if you've set up
1084: .B YY_INPUT
1085: to scan from a source other than
1086: .I yyin),
1087: and initializes
1088: .I yyin
1089: for scanning from that file. Essentially there is no difference between
1090: just assigning
1091: .I yyin
1092: to a new input file or using
1093: .B yyrestart()
1094: to do so; the latter is available for compatibility with previous versions
1095: of
1096: .I flex,
1097: and because it can be used to switch input files in the middle of scanning.
1098: It can also be used to throw away the current input buffer, by calling
1099: it with an argument of
1100: .I yyin;
1101: but better is to use
1102: .B YY_FLUSH_BUFFER
1103: (see above).
1104: Note that
1105: .B yyrestart()
1106: does
1107: .I not
1108: reset the start condition to
1109: .B INITIAL
1110: (see Start Conditions, below).
1111: .PP
1112: If
1113: .B yylex()
1114: stops scanning due to executing a
1115: .I return
1116: statement in one of the actions, the scanner may then be called again and it
1117: will resume scanning where it left off.
1118: .PP
1119: By default (and for purposes of efficiency), the scanner uses
1120: block-reads rather than simple
1121: .I getc()
1122: calls to read characters from
1123: .I yyin.
1124: The nature of how it gets its input can be controlled by defining the
1125: .B YY_INPUT
1126: macro.
1127: YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its
1128: action is to place up to
1129: .I max_size
1130: characters in the character array
1131: .I buf
1132: and return in the integer variable
1133: .I result
1134: either the
1135: number of characters read or the constant YY_NULL (0 on Unix systems)
1136: to indicate EOF. The default YY_INPUT reads from the
1137: global file-pointer "yyin".
1138: .PP
1139: A sample definition of YY_INPUT (in the definitions
1140: section of the input file):
1141: .nf
1142:
1143: %{
1144: #define YY_INPUT(buf,result,max_size) \\
1145: { \\
1146: int c = getchar(); \\
1147: result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\
1148: }
1149: %}
1150:
1151: .fi
1152: This definition will change the input processing to occur
1153: one character at a time.
1154: .PP
1155: When the scanner receives an end-of-file indication from YY_INPUT,
1156: it then checks the
1157: .B yywrap()
1158: function. If
1159: .B yywrap()
1160: returns false (zero), then it is assumed that the
1161: function has gone ahead and set up
1162: .I yyin
1163: to point to another input file, and scanning continues. If it returns
1164: true (non-zero), then the scanner terminates, returning 0 to its
1165: caller. Note that in either case, the start condition remains unchanged;
1166: it does
1167: .I not
1168: revert to
1169: .B INITIAL.
1170: .PP
1171: If you do not supply your own version of
1172: .B yywrap(),
1173: then you must either use
1174: .B %option noyywrap
1175: (in which case the scanner behaves as though
1176: .B yywrap()
1177: returned 1), or you must link with
1178: .B \-lfl
1179: to obtain the default version of the routine, which always returns 1.
1180: .PP
1181: Three routines are available for scanning from in-memory buffers rather
1182: than files:
1183: .B yy_scan_string(), yy_scan_bytes(),
1184: and
1185: .B yy_scan_buffer().
1186: See the discussion of them below in the section Multiple Input Buffers.
1187: .PP
1188: The scanner writes its
1189: .B ECHO
1190: output to the
1191: .I yyout
1192: global (default, stdout), which may be redefined by the user simply
1193: by assigning it to some other
1194: .B FILE
1195: pointer.
1196: .SH START CONDITIONS
1197: .I flex
1198: provides a mechanism for conditionally activating rules. Any rule
1199: whose pattern is prefixed with "<sc>" will only be active when
1200: the scanner is in the start condition named "sc". For example,
1201: .nf
1202:
1203: <STRING>[^"]* { /* eat up the string body ... */
1204: ...
1205: }
1206:
1207: .fi
1208: will be active only when the scanner is in the "STRING" start
1209: condition, and
1210: .nf
1211:
1212: <INITIAL,STRING,QUOTE>\\. { /* handle an escape ... */
1213: ...
1214: }
1215:
1216: .fi
1217: will be active only when the current start condition is
1218: either "INITIAL", "STRING", or "QUOTE".
1219: .PP
1220: Start conditions
1221: are declared in the definitions (first) section of the input
1222: using unindented lines beginning with either
1223: .B %s
1224: or
1225: .B %x
1226: followed by a list of names.
1227: The former declares
1228: .I inclusive
1229: start conditions, the latter
1230: .I exclusive
1231: start conditions. A start condition is activated using the
1232: .B BEGIN
1233: action. Until the next
1234: .B BEGIN
1235: action is executed, rules with the given start
1236: condition will be active and
1237: rules with other start conditions will be inactive.
1238: If the start condition is
1239: .I inclusive,
1240: then rules with no start conditions at all will also be active.
1241: If it is
1242: .I exclusive,
1243: then
1244: .I only
1245: rules qualified with the start condition will be active.
1246: A set of rules contingent on the same exclusive start condition
1247: describe a scanner which is independent of any of the other rules in the
1248: .I flex
1249: input. Because of this,
1250: exclusive start conditions make it easy to specify "mini-scanners"
1251: which scan portions of the input that are syntactically different
1252: from the rest (e.g., comments).
1253: .PP
1254: If the distinction between inclusive and exclusive start conditions
1255: is still a little vague, here's a simple example illustrating the
1256: connection between the two. The set of rules:
1257: .nf
1258:
1259: %s example
1260: %%
1261:
1262: <example>foo do_something();
1263:
1264: bar something_else();
1265:
1266: .fi
1267: is equivalent to
1268: .nf
1269:
1270: %x example
1271: %%
1272:
1273: <example>foo do_something();
1274:
1275: <INITIAL,example>bar something_else();
1276:
1277: .fi
1278: Without the
1279: .B <INITIAL,example>
1280: qualifier, the
1281: .I bar
1282: pattern in the second example wouldn't be active (i.e., couldn't match)
1283: when in start condition
1284: .B example.
1285: If we just used
1286: .B <example>
1287: to qualify
1288: .I bar,
1289: though, then it would only be active in
1290: .B example
1291: and not in
1292: .B INITIAL,
1293: while in the first example it's active in both, because in the first
1294: example the
1295: .B example
1.10 deraadt 1296: start condition is an
1.1 deraadt 1297: .I inclusive
1298: .B (%s)
1299: start condition.
1300: .PP
1301: Also note that the special start-condition specifier
1302: .B <*>
1303: matches every start condition. Thus, the above example could also
1304: have been written;
1305: .nf
1306:
1307: %x example
1308: %%
1309:
1310: <example>foo do_something();
1311:
1312: <*>bar something_else();
1313:
1314: .fi
1315: .PP
1316: The default rule (to
1317: .B ECHO
1318: any unmatched character) remains active in start conditions. It
1319: is equivalent to:
1320: .nf
1321:
1322: <*>.|\\n ECHO;
1323:
1324: .fi
1325: .PP
1326: .B BEGIN(0)
1327: returns to the original state where only the rules with
1328: no start conditions are active. This state can also be
1329: referred to as the start-condition "INITIAL", so
1330: .B BEGIN(INITIAL)
1331: is equivalent to
1332: .B BEGIN(0).
1333: (The parentheses around the start condition name are not required but
1334: are considered good style.)
1335: .PP
1336: .B BEGIN
1337: actions can also be given as indented code at the beginning
1338: of the rules section. For example, the following will cause
1339: the scanner to enter the "SPECIAL" start condition whenever
1340: .B yylex()
1341: is called and the global variable
1342: .I enter_special
1343: is true:
1344: .nf
1345:
1346: int enter_special;
1347:
1348: %x SPECIAL
1349: %%
1350: if ( enter_special )
1351: BEGIN(SPECIAL);
1352:
1353: <SPECIAL>blahblahblah
1354: ...more rules follow...
1355:
1356: .fi
1357: .PP
1358: To illustrate the uses of start conditions,
1359: here is a scanner which provides two different interpretations
1360: of a string like "123.456". By default it will treat it as
1361: three tokens, the integer "123", a dot ('.'), and the integer "456".
1362: But if the string is preceded earlier in the line by the string
1363: "expect-floats"
1364: it will treat it as a single token, the floating-point number
1365: 123.456:
1366: .nf
1367:
1368: %{
1369: #include <math.h>
1370: %}
1371: %s expect
1372:
1373: %%
1374: expect-floats BEGIN(expect);
1375:
1376: <expect>[0-9]+"."[0-9]+ {
1377: printf( "found a float, = %f\\n",
1378: atof( yytext ) );
1379: }
1380: <expect>\\n {
1381: /* that's the end of the line, so
1382: * we need another "expect-number"
1383: * before we'll recognize any more
1384: * numbers
1385: */
1386: BEGIN(INITIAL);
1387: }
1388:
1389: [0-9]+ {
1390: printf( "found an integer, = %d\\n",
1391: atoi( yytext ) );
1392: }
1393:
1394: "." printf( "found a dot\\n" );
1395:
1396: .fi
1397: Here is a scanner which recognizes (and discards) C comments while
1398: maintaining a count of the current input line.
1399: .nf
1400:
1401: %x comment
1402: %%
1403: int line_num = 1;
1404:
1405: "/*" BEGIN(comment);
1406:
1407: <comment>[^*\\n]* /* eat anything that's not a '*' */
1408: <comment>"*"+[^*/\\n]* /* eat up '*'s not followed by '/'s */
1409: <comment>\\n ++line_num;
1410: <comment>"*"+"/" BEGIN(INITIAL);
1411:
1412: .fi
1413: This scanner goes to a bit of trouble to match as much
1414: text as possible with each rule. In general, when attempting to write
1.10 deraadt 1415: a high-speed scanner try to match as much as possible in each rule, as
1.1 deraadt 1416: it's a big win.
1417: .PP
1.10 deraadt 1418: Note that start-condition names are really integer values and
1.1 deraadt 1419: can be stored as such. Thus, the above could be extended in the
1420: following fashion:
1421: .nf
1422:
1423: %x comment foo
1424: %%
1425: int line_num = 1;
1426: int comment_caller;
1427:
1428: "/*" {
1429: comment_caller = INITIAL;
1430: BEGIN(comment);
1431: }
1432:
1433: ...
1434:
1435: <foo>"/*" {
1436: comment_caller = foo;
1437: BEGIN(comment);
1438: }
1439:
1440: <comment>[^*\\n]* /* eat anything that's not a '*' */
1441: <comment>"*"+[^*/\\n]* /* eat up '*'s not followed by '/'s */
1442: <comment>\\n ++line_num;
1443: <comment>"*"+"/" BEGIN(comment_caller);
1444:
1445: .fi
1446: Furthermore, you can access the current start condition using
1447: the integer-valued
1448: .B YY_START
1449: macro. For example, the above assignments to
1450: .I comment_caller
1451: could instead be written
1452: .nf
1453:
1454: comment_caller = YY_START;
1455:
1456: .fi
1457: Flex provides
1458: .B YYSTATE
1459: as an alias for
1460: .B YY_START
1461: (since that is what's used by AT&T
1462: .I lex).
1463: .PP
1464: Note that start conditions do not have their own name-space; %s's and %x's
1465: declare names in the same fashion as #define's.
1466: .PP
1467: Finally, here's an example of how to match C-style quoted strings using
1468: exclusive start conditions, including expanded escape sequences (but
1469: not including checking for a string that's too long):
1470: .nf
1471:
1472: %x str
1473:
1474: %%
1475: char string_buf[MAX_STR_CONST];
1476: char *string_buf_ptr;
1477:
1478:
1479: \\" string_buf_ptr = string_buf; BEGIN(str);
1480:
1481: <str>\\" { /* saw closing quote - all done */
1482: BEGIN(INITIAL);
1483: *string_buf_ptr = '\\0';
1484: /* return string constant token type and
1485: * value to parser
1486: */
1487: }
1488:
1489: <str>\\n {
1490: /* error - unterminated string constant */
1491: /* generate error message */
1492: }
1493:
1494: <str>\\\\[0-7]{1,3} {
1495: /* octal escape sequence */
1496: int result;
1497:
1498: (void) sscanf( yytext + 1, "%o", &result );
1499:
1500: if ( result > 0xff )
1501: /* error, constant is out-of-bounds */
1502:
1503: *string_buf_ptr++ = result;
1504: }
1505:
1506: <str>\\\\[0-9]+ {
1507: /* generate error - bad escape sequence; something
1508: * like '\\48' or '\\0777777'
1509: */
1510: }
1511:
1512: <str>\\\\n *string_buf_ptr++ = '\\n';
1513: <str>\\\\t *string_buf_ptr++ = '\\t';
1514: <str>\\\\r *string_buf_ptr++ = '\\r';
1515: <str>\\\\b *string_buf_ptr++ = '\\b';
1516: <str>\\\\f *string_buf_ptr++ = '\\f';
1517:
1518: <str>\\\\(.|\\n) *string_buf_ptr++ = yytext[1];
1519:
1520: <str>[^\\\\\\n\\"]+ {
1521: char *yptr = yytext;
1522:
1523: while ( *yptr )
1524: *string_buf_ptr++ = *yptr++;
1525: }
1526:
1527: .fi
1528: .PP
1529: Often, such as in some of the examples above, you wind up writing a
1530: whole bunch of rules all preceded by the same start condition(s). Flex
1531: makes this a little easier and cleaner by introducing a notion of
1532: start condition
1533: .I scope.
1534: A start condition scope is begun with:
1535: .nf
1536:
1537: <SCs>{
1538:
1539: .fi
1540: where
1541: .I SCs
1542: is a list of one or more start conditions. Inside the start condition
1543: scope, every rule automatically has the prefix
1544: .I <SCs>
1545: applied to it, until a
1546: .I '}'
1547: which matches the initial
1548: .I '{'.
1549: So, for example,
1550: .nf
1551:
1552: <ESC>{
1553: "\\\\n" return '\\n';
1554: "\\\\r" return '\\r';
1555: "\\\\f" return '\\f';
1556: "\\\\0" return '\\0';
1557: }
1558:
1559: .fi
1560: is equivalent to:
1561: .nf
1562:
1563: <ESC>"\\\\n" return '\\n';
1564: <ESC>"\\\\r" return '\\r';
1565: <ESC>"\\\\f" return '\\f';
1566: <ESC>"\\\\0" return '\\0';
1567:
1568: .fi
1569: Start condition scopes may be nested.
1570: .PP
1571: Three routines are available for manipulating stacks of start conditions:
1572: .TP
1573: .B void yy_push_state(int new_state)
1574: pushes the current start condition onto the top of the start condition
1575: stack and switches to
1576: .I new_state
1577: as though you had used
1578: .B BEGIN new_state
1579: (recall that start condition names are also integers).
1580: .TP
1581: .B void yy_pop_state()
1582: pops the top of the stack and switches to it via
1583: .B BEGIN.
1584: .TP
1585: .B int yy_top_state()
1586: returns the top of the stack without altering the stack's contents.
1587: .PP
1588: The start condition stack grows dynamically and so has no built-in
1589: size limitation. If memory is exhausted, program execution aborts.
1590: .PP
1591: To use start condition stacks, your scanner must include a
1592: .B %option stack
1593: directive (see Options below).
1594: .SH MULTIPLE INPUT BUFFERS
1595: Some scanners (such as those which support "include" files)
1596: require reading from several input streams. As
1597: .I flex
1598: scanners do a large amount of buffering, one cannot control
1599: where the next input will be read from by simply writing a
1600: .B YY_INPUT
1601: which is sensitive to the scanning context.
1602: .B YY_INPUT
1603: is only called when the scanner reaches the end of its buffer, which
1604: may be a long time after scanning a statement such as an "include"
1605: which requires switching the input source.
1606: .PP
1607: To negotiate these sorts of problems,
1608: .I flex
1609: provides a mechanism for creating and switching between multiple
1610: input buffers. An input buffer is created by using:
1611: .nf
1612:
1613: YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
1614:
1615: .fi
1616: which takes a
1617: .I FILE
1618: pointer and a size and creates a buffer associated with the given
1619: file and large enough to hold
1620: .I size
1621: characters (when in doubt, use
1622: .B YY_BUF_SIZE
1623: for the size). It returns a
1624: .B YY_BUFFER_STATE
1625: handle, which may then be passed to other routines (see below). The
1626: .B YY_BUFFER_STATE
1627: type is a pointer to an opaque
1628: .B struct yy_buffer_state
1629: structure, so you may safely initialize YY_BUFFER_STATE variables to
1630: .B ((YY_BUFFER_STATE) 0)
1631: if you wish, and also refer to the opaque structure in order to
1632: correctly declare input buffers in source files other than that
1633: of your scanner. Note that the
1634: .I FILE
1635: pointer in the call to
1636: .B yy_create_buffer
1637: is only used as the value of
1638: .I yyin
1639: seen by
1640: .B YY_INPUT;
1641: if you redefine
1642: .B YY_INPUT
1643: so it no longer uses
1644: .I yyin,
1645: then you can safely pass a nil
1646: .I FILE
1647: pointer to
1648: .B yy_create_buffer.
1649: You select a particular buffer to scan from using:
1650: .nf
1651:
1652: void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
1653:
1654: .fi
1655: switches the scanner's input buffer so subsequent tokens will
1656: come from
1657: .I new_buffer.
1658: Note that
1659: .B yy_switch_to_buffer()
1660: may be used by yywrap() to set things up for continued scanning, instead
1661: of opening a new file and pointing
1662: .I yyin
1663: at it. Note also that switching input sources via either
1664: .B yy_switch_to_buffer()
1665: or
1666: .B yywrap()
1667: does
1668: .I not
1669: change the start condition.
1670: .nf
1671:
1672: void yy_delete_buffer( YY_BUFFER_STATE buffer )
1673:
1674: .fi
1675: is used to reclaim the storage associated with a buffer. (
1676: .B buffer
1677: can be nil, in which case the routine does nothing.)
1678: You can also clear the current contents of a buffer using:
1679: .nf
1680:
1681: void yy_flush_buffer( YY_BUFFER_STATE buffer )
1682:
1683: .fi
1684: This function discards the buffer's contents,
1685: so the next time the scanner attempts to match a token from the
1686: buffer, it will first fill the buffer anew using
1687: .B YY_INPUT.
1688: .PP
1689: .B yy_new_buffer()
1690: is an alias for
1691: .B yy_create_buffer(),
1692: provided for compatibility with the C++ use of
1693: .I new
1694: and
1695: .I delete
1696: for creating and destroying dynamic objects.
1697: .PP
1698: Finally, the
1699: .B YY_CURRENT_BUFFER
1700: macro returns a
1701: .B YY_BUFFER_STATE
1702: handle to the current buffer.
1703: .PP
1704: Here is an example of using these features for writing a scanner
1705: which expands include files (the
1706: .B <<EOF>>
1707: feature is discussed below):
1708: .nf
1709:
1710: /* the "incl" state is used for picking up the name
1711: * of an include file
1712: */
1713: %x incl
1714:
1715: %{
1716: #define MAX_INCLUDE_DEPTH 10
1717: YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
1718: int include_stack_ptr = 0;
1719: %}
1720:
1721: %%
1722: include BEGIN(incl);
1723:
1724: [a-z]+ ECHO;
1725: [^a-z\\n]*\\n? ECHO;
1726:
1727: <incl>[ \\t]* /* eat the whitespace */
1728: <incl>[^ \\t\\n]+ { /* got the include file name */
1729: if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
1730: {
1731: fprintf( stderr, "Includes nested too deeply" );
1732: exit( 1 );
1733: }
1734:
1735: include_stack[include_stack_ptr++] =
1736: YY_CURRENT_BUFFER;
1737:
1738: yyin = fopen( yytext, "r" );
1739:
1740: if ( ! yyin )
1741: error( ... );
1742:
1743: yy_switch_to_buffer(
1744: yy_create_buffer( yyin, YY_BUF_SIZE ) );
1745:
1746: BEGIN(INITIAL);
1747: }
1748:
1749: <<EOF>> {
1750: if ( --include_stack_ptr < 0 )
1751: {
1752: yyterminate();
1753: }
1754:
1755: else
1756: {
1757: yy_delete_buffer( YY_CURRENT_BUFFER );
1758: yy_switch_to_buffer(
1759: include_stack[include_stack_ptr] );
1760: }
1761: }
1762:
1763: .fi
1764: Three routines are available for setting up input buffers for
1765: scanning in-memory strings instead of files. All of them create
1766: a new input buffer for scanning the string, and return a corresponding
1767: .B YY_BUFFER_STATE
1768: handle (which you should delete with
1769: .B yy_delete_buffer()
1770: when done with it). They also switch to the new buffer using
1771: .B yy_switch_to_buffer(),
1772: so the next call to
1773: .B yylex()
1774: will start scanning the string.
1775: .TP
1776: .B yy_scan_string(const char *str)
1777: scans a NUL-terminated string.
1778: .TP
1779: .B yy_scan_bytes(const char *bytes, int len)
1780: scans
1781: .I len
1782: bytes (including possibly NUL's)
1783: starting at location
1784: .I bytes.
1785: .PP
1786: Note that both of these functions create and scan a
1787: .I copy
1788: of the string or bytes. (This may be desirable, since
1789: .B yylex()
1790: modifies the contents of the buffer it is scanning.) You can avoid the
1791: copy by using:
1792: .TP
1793: .B yy_scan_buffer(char *base, yy_size_t size)
1794: which scans in place the buffer starting at
1795: .I base,
1796: consisting of
1797: .I size
1798: bytes, the last two bytes of which
1799: .I must
1800: be
1801: .B YY_END_OF_BUFFER_CHAR
1802: (ASCII NUL).
1803: These last two bytes are not scanned; thus, scanning
1804: consists of
1805: .B base[0]
1806: through
1807: .B base[size-2],
1808: inclusive.
1809: .IP
1810: If you fail to set up
1811: .I base
1812: in this manner (i.e., forget the final two
1813: .B YY_END_OF_BUFFER_CHAR
1814: bytes), then
1815: .B yy_scan_buffer()
1816: returns a nil pointer instead of creating a new input buffer.
1817: .IP
1818: The type
1819: .B yy_size_t
1820: is an integral type to which you can cast an integer expression
1821: reflecting the size of the buffer.
1822: .SH END-OF-FILE RULES
1823: The special rule "<<EOF>>" indicates
1824: actions which are to be taken when an end-of-file is
1825: encountered and yywrap() returns non-zero (i.e., indicates
1826: no further files to process). The action must finish
1827: by doing one of four things:
1828: .IP -
1829: assigning
1830: .I yyin
1831: to a new input file (in previous versions of flex, after doing the
1832: assignment you had to call the special action
1833: .B YY_NEW_FILE;
1834: this is no longer necessary);
1835: .IP -
1836: executing a
1837: .I return
1838: statement;
1839: .IP -
1840: executing the special
1841: .B yyterminate()
1842: action;
1843: .IP -
1844: or, switching to a new buffer using
1845: .B yy_switch_to_buffer()
1846: as shown in the example above.
1847: .PP
1848: <<EOF>> rules may not be used with other
1849: patterns; they may only be qualified with a list of start
1850: conditions. If an unqualified <<EOF>> rule is given, it
1851: applies to
1852: .I all
1853: start conditions which do not already have <<EOF>> actions. To
1854: specify an <<EOF>> rule for only the initial start condition, use
1855: .nf
1856:
1857: <INITIAL><<EOF>>
1858:
1859: .fi
1860: .PP
1861: These rules are useful for catching things like unclosed comments.
1862: An example:
1863: .nf
1864:
1865: %x quote
1866: %%
1867:
1868: ...other rules for dealing with quotes...
1869:
1870: <quote><<EOF>> {
1871: error( "unterminated quote" );
1872: yyterminate();
1873: }
1874: <<EOF>> {
1875: if ( *++filelist )
1876: yyin = fopen( *filelist, "r" );
1877: else
1878: yyterminate();
1879: }
1880:
1881: .fi
1882: .SH MISCELLANEOUS MACROS
1883: The macro
1884: .B YY_USER_ACTION
1885: can be defined to provide an action
1886: which is always executed prior to the matched rule's action. For example,
1887: it could be #define'd to call a routine to convert yytext to lower-case.
1888: When
1889: .B YY_USER_ACTION
1890: is invoked, the variable
1891: .I yy_act
1892: gives the number of the matched rule (rules are numbered starting with 1).
1893: Suppose you want to profile how often each of your rules is matched. The
1894: following would do the trick:
1895: .nf
1896:
1897: #define YY_USER_ACTION ++ctr[yy_act]
1898:
1899: .fi
1900: where
1901: .I ctr
1902: is an array to hold the counts for the different rules. Note that
1903: the macro
1904: .B YY_NUM_RULES
1905: gives the total number of rules (including the default rule, even if
1906: you use
1907: .B \-s),
1908: so a correct declaration for
1909: .I ctr
1910: is:
1911: .nf
1912:
1913: int ctr[YY_NUM_RULES];
1914:
1915: .fi
1916: .PP
1917: The macro
1918: .B YY_USER_INIT
1919: may be defined to provide an action which is always executed before
1920: the first scan (and before the scanner's internal initializations are done).
1921: For example, it could be used to call a routine to read
1922: in a data table or open a logging file.
1923: .PP
1924: The macro
1925: .B yy_set_interactive(is_interactive)
1926: can be used to control whether the current buffer is considered
1927: .I interactive.
1928: An interactive buffer is processed more slowly,
1929: but must be used when the scanner's input source is indeed
1930: interactive to avoid problems due to waiting to fill buffers
1931: (see the discussion of the
1932: .B \-I
1933: flag below). A non-zero value
1.7 aaron 1934: in the macro invocation marks the buffer as interactive, a zero
1.1 deraadt 1935: value as non-interactive. Note that use of this macro overrides
1936: .B %option always-interactive
1937: or
1938: .B %option never-interactive
1939: (see Options below).
1940: .B yy_set_interactive()
1941: must be invoked prior to beginning to scan the buffer that is
1942: (or is not) to be considered interactive.
1943: .PP
1944: The macro
1945: .B yy_set_bol(at_bol)
1946: can be used to control whether the current buffer's scanning
1947: context for the next token match is done as though at the
1948: beginning of a line. A non-zero macro argument makes rules anchored with
1.10 deraadt 1949: \'^' active, while a zero argument makes '^' rules inactive.
1.1 deraadt 1950: .PP
1951: The macro
1952: .B YY_AT_BOL()
1953: returns true if the next token scanned from the current buffer
1954: will have '^' rules active, false otherwise.
1955: .PP
1956: In the generated scanner, the actions are all gathered in one large
1957: switch statement and separated using
1958: .B YY_BREAK,
1959: which may be redefined. By default, it is simply a "break", to separate
1.10 deraadt 1960: each rule's action from the following rules.
1.1 deraadt 1961: Redefining
1962: .B YY_BREAK
1963: allows, for example, C++ users to
1964: #define YY_BREAK to do nothing (while being very careful that every
1965: rule ends with a "break" or a "return"!) to avoid suffering from
1966: unreachable statement warnings where because a rule's action ends with
1967: "return", the
1968: .B YY_BREAK
1969: is inaccessible.
1970: .SH VALUES AVAILABLE TO THE USER
1971: This section summarizes the various values available to the user
1972: in the rule actions.
1973: .IP -
1974: .B char *yytext
1975: holds the text of the current token. It may be modified but not lengthened
1976: (you cannot append characters to the end).
1977: .IP
1978: If the special directive
1979: .B %array
1980: appears in the first section of the scanner description, then
1981: .B yytext
1982: is instead declared
1983: .B char yytext[YYLMAX],
1984: where
1985: .B YYLMAX
1986: is a macro definition that you can redefine in the first section
1987: if you don't like the default value (generally 8KB). Using
1988: .B %array
1989: results in somewhat slower scanners, but the value of
1990: .B yytext
1991: becomes immune to calls to
1992: .I input()
1993: and
1994: .I unput(),
1995: which potentially destroy its value when
1996: .B yytext
1997: is a character pointer. The opposite of
1998: .B %array
1999: is
2000: .B %pointer,
2001: which is the default.
2002: .IP
2003: You cannot use
2004: .B %array
2005: when generating C++ scanner classes
2006: (the
2007: .B \-+
2008: flag).
2009: .IP -
2010: .B int yyleng
2011: holds the length of the current token.
2012: .IP -
2013: .B FILE *yyin
2014: is the file which by default
2015: .I flex
2016: reads from. It may be redefined but doing so only makes sense before
2017: scanning begins or after an EOF has been encountered. Changing it in
2018: the midst of scanning will have unexpected results since
2019: .I flex
2020: buffers its input; use
2021: .B yyrestart()
2022: instead.
2023: Once scanning terminates because an end-of-file
2024: has been seen, you can assign
2025: .I yyin
2026: at the new input file and then call the scanner again to continue scanning.
2027: .IP -
2028: .B void yyrestart( FILE *new_file )
2029: may be called to point
2030: .I yyin
2031: at the new input file. The switch-over to the new file is immediate
2032: (any previously buffered-up input is lost). Note that calling
2033: .B yyrestart()
2034: with
2035: .I yyin
2036: as an argument thus throws away the current input buffer and continues
2037: scanning the same input file.
2038: .IP -
2039: .B FILE *yyout
2040: is the file to which
2041: .B ECHO
2042: actions are done. It can be reassigned by the user.
2043: .IP -
2044: .B YY_CURRENT_BUFFER
2045: returns a
2046: .B YY_BUFFER_STATE
2047: handle to the current buffer.
2048: .IP -
2049: .B YY_START
2050: returns an integer value corresponding to the current start
2051: condition. You can subsequently use this value with
2052: .B BEGIN
2053: to return to that start condition.
2054: .SH INTERFACING WITH YACC
2055: One of the main uses of
2056: .I flex
2057: is as a companion to the
2058: .I yacc
2059: parser-generator.
2060: .I yacc
2061: parsers expect to call a routine named
2062: .B yylex()
2063: to find the next input token. The routine is supposed to
2064: return the type of the next token as well as putting any associated
2065: value in the global
2066: .B yylval.
2067: To use
2068: .I flex
2069: with
2070: .I yacc,
2071: one specifies the
2072: .B \-d
2073: option to
2074: .I yacc
2075: to instruct it to generate the file
2076: .B y.tab.h
2077: containing definitions of all the
2078: .B %tokens
2079: appearing in the
2080: .I yacc
2081: input. This file is then included in the
2082: .I flex
2083: scanner. For example, if one of the tokens is "TOK_NUMBER",
2084: part of the scanner might look like:
2085: .nf
2086:
2087: %{
2088: #include "y.tab.h"
2089: %}
2090:
2091: %%
2092:
2093: [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;
2094:
2095: .fi
2096: .SH OPTIONS
2097: .I flex
2098: has the following options:
2099: .TP
2100: .B \-b
2101: Generate backing-up information to
2102: .I lex.backup.
2103: This is a list of scanner states which require backing up
2104: and the input characters on which they do so. By adding rules one
2105: can remove backing-up states. If
2106: .I all
2107: backing-up states are eliminated and
2108: .B \-Cf
2109: or
2110: .B \-CF
2111: is used, the generated scanner will run faster (see the
2112: .B \-p
2113: flag). Only users who wish to squeeze every last cycle out of their
2114: scanners need worry about this option. (See the section on Performance
2115: Considerations below.)
2116: .TP
2117: .B \-c
2118: is a do-nothing, deprecated option included for POSIX compliance.
2119: .TP
2120: .B \-d
2121: makes the generated scanner run in
2122: .I debug
2123: mode. Whenever a pattern is recognized and the global
2124: .B yy_flex_debug
2125: is non-zero (which is the default),
2126: the scanner will write to
2127: .I stderr
2128: a line of the form:
2129: .nf
2130:
2131: --accepting rule at line 53 ("the matched text")
2132:
2133: .fi
2134: The line number refers to the location of the rule in the file
2135: defining the scanner (i.e., the file that was fed to flex). Messages
2136: are also generated when the scanner backs up, accepts the
2137: default rule, reaches the end of its input buffer (or encounters
2138: a NUL; at this point, the two look the same as far as the scanner's concerned),
2139: or reaches an end-of-file.
2140: .TP
2141: .B \-f
2142: specifies
2143: .I fast scanner.
2144: No table compression is done and stdio is bypassed.
2145: The result is large but fast. This option is equivalent to
2146: .B \-Cfr
2147: (see below).
2148: .TP
2149: .B \-h
2150: generates a "help" summary of
2151: .I flex's
2152: options to
1.7 aaron 2153: .I stdout
1.1 deraadt 2154: and then exits.
2155: .B \-?
2156: and
2157: .B \-\-help
2158: are synonyms for
2159: .B \-h.
2160: .TP
2161: .B \-i
2162: instructs
2163: .I flex
2164: to generate a
2165: .I case-insensitive
2166: scanner. The case of letters given in the
2167: .I flex
2168: input patterns will
2169: be ignored, and tokens in the input will be matched regardless of case. The
2170: matched text given in
2171: .I yytext
2172: will have the preserved case (i.e., it will not be folded).
2173: .TP
2174: .B \-l
2175: turns on maximum compatibility with the original AT&T
2176: .I lex
2177: implementation. Note that this does not mean
2178: .I full
2179: compatibility. Use of this option costs a considerable amount of
2180: performance, and it cannot be used with the
2181: .B \-+, -f, -F, -Cf,
2182: or
2183: .B -CF
2184: options. For details on the compatibilities it provides, see the section
2185: "Incompatibilities With Lex And POSIX" below. This option also results
2186: in the name
2187: .B YY_FLEX_LEX_COMPAT
2188: being #define'd in the generated scanner.
2189: .TP
2190: .B \-n
2191: is another do-nothing, deprecated option included only for
2192: POSIX compliance.
2193: .TP
2194: .B \-p
2195: generates a performance report to stderr. The report
2196: consists of comments regarding features of the
2197: .I flex
2198: input file which will cause a serious loss of performance in the resulting
2199: scanner. If you give the flag twice, you will also get comments regarding
2200: features that lead to minor performance losses.
2201: .IP
2202: Note that the use of
2203: .B REJECT,
2204: .B %option yylineno,
2205: and variable trailing context (see the Deficiencies / Bugs section below)
2206: entails a substantial performance penalty; use of
2207: .I yymore(),
2208: the
2209: .B ^
2210: operator,
2211: and the
2212: .B \-I
2213: flag entail minor performance penalties.
2214: .TP
2215: .B \-s
2216: causes the
2217: .I default rule
2218: (that unmatched scanner input is echoed to
2219: .I stdout)
2220: to be suppressed. If the scanner encounters input that does not
2221: match any of its rules, it aborts with an error. This option is
2222: useful for finding holes in a scanner's rule set.
2223: .TP
2224: .B \-t
2225: instructs
2226: .I flex
2227: to write the scanner it generates to standard output instead
2228: of
2229: .B lex.yy.c.
2230: .TP
2231: .B \-v
2232: specifies that
2233: .I flex
2234: should write to
2235: .I stderr
2236: a summary of statistics regarding the scanner it generates.
2237: Most of the statistics are meaningless to the casual
2238: .I flex
2239: user, but the first line identifies the version of
2240: .I flex
2241: (same as reported by
2242: .B \-V),
2243: and the next line the flags used when generating the scanner, including
2244: those that are on by default.
2245: .TP
2246: .B \-w
2247: suppresses warning messages.
2248: .TP
2249: .B \-B
2250: instructs
2251: .I flex
2252: to generate a
2253: .I batch
2254: scanner, the opposite of
2255: .I interactive
2256: scanners generated by
2257: .B \-I
2258: (see below). In general, you use
2259: .B \-B
2260: when you are
2261: .I certain
2262: that your scanner will never be used interactively, and you want to
2263: squeeze a
2264: .I little
2265: more performance out of it. If your goal is instead to squeeze out a
2266: .I lot
2267: more performance, you should be using the
2268: .B \-Cf
2269: or
2270: .B \-CF
2271: options (discussed below), which turn on
2272: .B \-B
2273: automatically anyway.
2274: .TP
2275: .B \-F
2276: specifies that the
2277: .ul
2278: fast
2279: scanner table representation should be used (and stdio
2280: bypassed). This representation is
2281: about as fast as the full table representation
2282: .B (-f),
2283: and for some sets of patterns will be considerably smaller (and for
2284: others, larger). In general, if the pattern set contains both "keywords"
2285: and a catch-all, "identifier" rule, such as in the set:
2286: .nf
2287:
2288: "case" return TOK_CASE;
2289: "switch" return TOK_SWITCH;
2290: ...
2291: "default" return TOK_DEFAULT;
2292: [a-z]+ return TOK_ID;
2293:
2294: .fi
2295: then you're better off using the full table representation. If only
2296: the "identifier" rule is present and you then use a hash table or some such
2297: to detect the keywords, you're better off using
2298: .B -F.
2299: .IP
2300: This option is equivalent to
2301: .B \-CFr
2302: (see below). It cannot be used with
2303: .B \-+.
2304: .TP
2305: .B \-I
2306: instructs
2307: .I flex
2308: to generate an
2309: .I interactive
2310: scanner. An interactive scanner is one that only looks ahead to decide
2311: what token has been matched if it absolutely must. It turns out that
2312: always looking one extra character ahead, even if the scanner has already
2313: seen enough text to disambiguate the current token, is a bit faster than
2314: only looking ahead when necessary. But scanners that always look ahead
2315: give dreadful interactive performance; for example, when a user types
2316: a newline, it is not recognized as a newline token until they enter
2317: .I another
2318: token, which often means typing in another whole line.
2319: .IP
2320: .I Flex
2321: scanners default to
2322: .I interactive
2323: unless you use the
2324: .B \-Cf
2325: or
2326: .B \-CF
2327: table-compression options (see below). That's because if you're looking
2328: for high-performance you should be using one of these options, so if you
2329: didn't,
2330: .I flex
2331: assumes you'd rather trade off a bit of run-time performance for intuitive
2332: interactive behavior. Note also that you
2333: .I cannot
2334: use
2335: .B \-I
2336: in conjunction with
2337: .B \-Cf
2338: or
2339: .B \-CF.
2340: Thus, this option is not really needed; it is on by default for all those
2341: cases in which it is allowed.
2342: .IP
2343: You can force a scanner to
2344: .I not
2345: be interactive by using
2346: .B \-B
2347: (see above).
2348: .TP
2349: .B \-L
2350: instructs
2351: .I flex
2352: not to generate
2353: .B #line
2354: directives. Without this option,
2355: .I flex
2356: peppers the generated scanner
2357: with #line directives so error messages in the actions will be correctly
2358: located with respect to either the original
2359: .I flex
2360: input file (if the errors are due to code in the input file), or
2361: .B lex.yy.c
2362: (if the errors are
2363: .I flex's
2364: fault -- you should report these sorts of errors to the email address
2365: given below).
2366: .TP
2367: .B \-T
2368: makes
2369: .I flex
2370: run in
2371: .I trace
2372: mode. It will generate a lot of messages to
2373: .I stderr
2374: concerning
2375: the form of the input and the resultant non-deterministic and deterministic
2376: finite automata. This option is mostly for use in maintaining
2377: .I flex.
2378: .TP
2379: .B \-V
2380: prints the version number to
2381: .I stdout
2382: and exits.
2383: .B \-\-version
2384: is a synonym for
2385: .B \-V.
2386: .TP
2387: .B \-7
2388: instructs
2389: .I flex
2390: to generate a 7-bit scanner, i.e., one which can only recognized 7-bit
2391: characters in its input. The advantage of using
2392: .B \-7
2393: is that the scanner's tables can be up to half the size of those generated
2394: using the
2395: .B \-8
2396: option (see below). The disadvantage is that such scanners often hang
2397: or crash if their input contains an 8-bit character.
2398: .IP
2399: Note, however, that unless you generate your scanner using the
2400: .B \-Cf
2401: or
2402: .B \-CF
2403: table compression options, use of
2404: .B \-7
2405: will save only a small amount of table space, and make your scanner
2406: considerably less portable.
2407: .I Flex's
2408: default behavior is to generate an 8-bit scanner unless you use the
2409: .B \-Cf
2410: or
2411: .B \-CF,
2412: in which case
2413: .I flex
2414: defaults to generating 7-bit scanners unless your site was always
2415: configured to generate 8-bit scanners (as will often be the case
2416: with non-USA sites). You can tell whether flex generated a 7-bit
2417: or an 8-bit scanner by inspecting the flag summary in the
2418: .B \-v
2419: output as described above.
2420: .IP
2421: Note that if you use
2422: .B \-Cfe
2423: or
2424: .B \-CFe
2425: (those table compression options, but also using equivalence classes as
2426: discussed see below), flex still defaults to generating an 8-bit
2427: scanner, since usually with these compression options full 8-bit tables
2428: are not much more expensive than 7-bit tables.
2429: .TP
2430: .B \-8
2431: instructs
2432: .I flex
2433: to generate an 8-bit scanner, i.e., one which can recognize 8-bit
2434: characters. This flag is only needed for scanners generated using
2435: .B \-Cf
2436: or
2437: .B \-CF,
2438: as otherwise flex defaults to generating an 8-bit scanner anyway.
2439: .IP
2440: See the discussion of
2441: .B \-7
2442: above for flex's default behavior and the tradeoffs between 7-bit
2443: and 8-bit scanners.
2444: .TP
2445: .B \-+
2446: specifies that you want flex to generate a C++
2447: scanner class. See the section on Generating C++ Scanners below for
2448: details.
1.7 aaron 2449: .TP
1.1 deraadt 2450: .B \-C[aefFmr]
2451: controls the degree of table compression and, more generally, trade-offs
2452: between small scanners and fast scanners.
2453: .IP
2454: .B \-Ca
2455: ("align") instructs flex to trade off larger tables in the
2456: generated scanner for faster performance because the elements of
2457: the tables are better aligned for memory access and computation. On some
2458: RISC architectures, fetching and manipulating longwords is more efficient
2459: than with smaller-sized units such as shortwords. This option can
2460: double the size of the tables used by your scanner.
2461: .IP
2462: .B \-Ce
2463: directs
2464: .I flex
2465: to construct
2466: .I equivalence classes,
2467: i.e., sets of characters
2468: which have identical lexical properties (for example, if the only
2469: appearance of digits in the
2470: .I flex
2471: input is in the character class
2472: "[0-9]" then the digits '0', '1', ..., '9' will all be put
2473: in the same equivalence class). Equivalence classes usually give
2474: dramatic reductions in the final table/object file sizes (typically
2475: a factor of 2-5) and are pretty cheap performance-wise (one array
2476: look-up per character scanned).
2477: .IP
2478: .B \-Cf
2479: specifies that the
2480: .I full
2481: scanner tables should be generated -
2482: .I flex
2483: should not compress the
1.10 deraadt 2484: tables by taking advantage of similar transition functions for
1.1 deraadt 2485: different states.
2486: .IP
2487: .B \-CF
2488: specifies that the alternate fast scanner representation (described
2489: above under the
2490: .B \-F
2491: flag)
2492: should be used. This option cannot be used with
2493: .B \-+.
2494: .IP
2495: .B \-Cm
2496: directs
2497: .I flex
2498: to construct
2499: .I meta-equivalence classes,
2500: which are sets of equivalence classes (or characters, if equivalence
2501: classes are not being used) that are commonly used together. Meta-equivalence
2502: classes are often a big win when using compressed tables, but they
2503: have a moderate performance impact (one or two "if" tests and one
2504: array look-up per character scanned).
2505: .IP
2506: .B \-Cr
2507: causes the generated scanner to
2508: .I bypass
2509: use of the standard I/O library (stdio) for input. Instead of calling
2510: .B fread()
2511: or
2512: .B getc(),
2513: the scanner will use the
2514: .B read()
2515: system call, resulting in a performance gain which varies from system
2516: to system, but in general is probably negligible unless you are also using
2517: .B \-Cf
2518: or
2519: .B \-CF.
2520: Using
2521: .B \-Cr
2522: can cause strange behavior if, for example, you read from
2523: .I yyin
2524: using stdio prior to calling the scanner (because the scanner will miss
2525: whatever text your previous reads left in the stdio input buffer).
2526: .IP
2527: .B \-Cr
2528: has no effect if you define
2529: .B YY_INPUT
2530: (see The Generated Scanner above).
2531: .IP
2532: A lone
2533: .B \-C
2534: specifies that the scanner tables should be compressed but neither
2535: equivalence classes nor meta-equivalence classes should be used.
2536: .IP
2537: The options
2538: .B \-Cf
2539: or
2540: .B \-CF
2541: and
2542: .B \-Cm
2543: do not make sense together - there is no opportunity for meta-equivalence
2544: classes if the table is not being compressed. Otherwise the options
2545: may be freely mixed, and are cumulative.
2546: .IP
2547: The default setting is
2548: .B \-Cem,
2549: which specifies that
2550: .I flex
2551: should generate equivalence classes
2552: and meta-equivalence classes. This setting provides the highest
2553: degree of table compression. You can trade off
2554: faster-executing scanners at the cost of larger tables with
2555: the following generally being true:
2556: .nf
2557:
2558: slowest & smallest
2559: -Cem
2560: -Cm
2561: -Ce
2562: -C
2563: -C{f,F}e
2564: -C{f,F}
2565: -C{f,F}a
2566: fastest & largest
2567:
2568: .fi
2569: Note that scanners with the smallest tables are usually generated and
2570: compiled the quickest, so
2571: during development you will usually want to use the default, maximal
2572: compression.
2573: .IP
2574: .B \-Cfe
2575: is often a good compromise between speed and size for production
2576: scanners.
2577: .TP
2578: .B \-ooutput
2579: directs flex to write the scanner to the file
2580: .B output
2581: instead of
2582: .B lex.yy.c.
2583: If you combine
2584: .B \-o
2585: with the
2586: .B \-t
2587: option, then the scanner is written to
2588: .I stdout
2589: but its
2590: .B #line
2591: directives (see the
2592: .B \\-L
2593: option above) refer to the file
2594: .B output.
2595: .TP
2596: .B \-Pprefix
2597: changes the default
2598: .I "yy"
2599: prefix used by
2600: .I flex
1.6 aaron 2601: for all globally visible variable and function names to instead be
1.1 deraadt 2602: .I prefix.
2603: For example,
2604: .B \-Pfoo
2605: changes the name of
2606: .B yytext
2607: to
2608: .B footext.
2609: It also changes the name of the default output file from
2610: .B lex.yy.c
2611: to
2612: .B lex.foo.c.
2613: Here are all of the names affected:
2614: .nf
2615:
2616: yy_create_buffer
2617: yy_delete_buffer
2618: yy_flex_debug
2619: yy_init_buffer
2620: yy_flush_buffer
2621: yy_load_buffer_state
2622: yy_switch_to_buffer
2623: yyin
2624: yyleng
2625: yylex
2626: yylineno
2627: yyout
2628: yyrestart
2629: yytext
2630: yywrap
2631:
2632: .fi
2633: (If you are using a C++ scanner, then only
2634: .B yywrap
2635: and
2636: .B yyFlexLexer
2637: are affected.)
2638: Within your scanner itself, you can still refer to the global variables
2639: and functions using either version of their name; but externally, they
2640: have the modified name.
2641: .IP
2642: This option lets you easily link together multiple
2643: .I flex
2644: programs into the same executable. Note, though, that using this
2645: option also renames
2646: .B yywrap(),
2647: so you now
2648: .I must
2649: either
1.6 aaron 2650: provide your own (appropriately named) version of the routine for your
1.1 deraadt 2651: scanner, or use
2652: .B %option noyywrap,
2653: as linking with
2654: .B \-lfl
2655: no longer provides one for you by default.
2656: .TP
2657: .B \-Sskeleton_file
2658: overrides the default skeleton file from which
2659: .I flex
2660: constructs its scanners. You'll never need this option unless you are doing
2661: .I flex
2662: maintenance or development.
2663: .PP
2664: .I flex
2665: also provides a mechanism for controlling options within the
2666: scanner specification itself, rather than from the flex command-line.
2667: This is done by including
2668: .B %option
2669: directives in the first section of the scanner specification.
2670: You can specify multiple options with a single
2671: .B %option
2672: directive, and multiple directives in the first section of your flex input
2673: file.
2674: .PP
2675: Most options are given simply as names, optionally preceded by the
2676: word "no" (with no intervening whitespace) to negate their meaning.
2677: A number are equivalent to flex flags or their negation:
2678: .nf
2679:
2680: 7bit -7 option
2681: 8bit -8 option
2682: align -Ca option
2683: backup -b option
2684: batch -B option
2685: c++ -+ option
2686:
2687: caseful or
2688: case-sensitive opposite of -i (default)
2689:
2690: case-insensitive or
2691: caseless -i option
2692:
2693: debug -d option
2694: default opposite of -s option
2695: ecs -Ce option
2696: fast -F option
2697: full -f option
2698: interactive -I option
2699: lex-compat -l option
2700: meta-ecs -Cm option
2701: perf-report -p option
2702: read -Cr option
2703: stdout -t option
2704: verbose -v option
2705: warn opposite of -w option
2706: (use "%option nowarn" for -w)
2707:
2708: array equivalent to "%array"
2709: pointer equivalent to "%pointer" (default)
2710:
2711: .fi
2712: Some
2713: .B %option's
2714: provide features otherwise not available:
2715: .TP
2716: .B always-interactive
2717: instructs flex to generate a scanner which always considers its input
2718: "interactive". Normally, on each new input file the scanner calls
2719: .B isatty()
2720: in an attempt to determine whether
2721: the scanner's input source is interactive and thus should be read a
2722: character at a time. When this option is used, however, then no
2723: such call is made.
2724: .TP
2725: .B main
2726: directs flex to provide a default
2727: .B main()
2728: program for the scanner, which simply calls
2729: .B yylex().
2730: This option implies
2731: .B noyywrap
2732: (see below).
2733: .TP
2734: .B never-interactive
2735: instructs flex to generate a scanner which never considers its input
2736: "interactive" (again, no call made to
2737: .B isatty()).
2738: This is the opposite of
2739: .B always-interactive.
2740: .TP
2741: .B stack
2742: enables the use of start condition stacks (see Start Conditions above).
2743: .TP
2744: .B stdinit
2745: if set (i.e.,
2746: .B %option stdinit)
2747: initializes
2748: .I yyin
2749: and
2750: .I yyout
2751: to
2752: .I stdin
2753: and
2754: .I stdout,
2755: instead of the default of
2756: .I nil.
2757: Some existing
2758: .I lex
2759: programs depend on this behavior, even though it is not compliant with
2760: ANSI C, which does not require
2761: .I stdin
2762: and
2763: .I stdout
2764: to be compile-time constant.
2765: .TP
2766: .B yylineno
2767: directs
2768: .I flex
2769: to generate a scanner that maintains the number of the current line
2770: read from its input in the global variable
2771: .B yylineno.
2772: This option is implied by
2773: .B %option lex-compat.
2774: .TP
2775: .B yywrap
2776: if unset (i.e.,
2777: .B %option noyywrap),
2778: makes the scanner not call
2779: .B yywrap()
2780: upon an end-of-file, but simply assume that there are no more
2781: files to scan (until the user points
2782: .I yyin
2783: at a new file and calls
2784: .B yylex()
2785: again).
2786: .PP
2787: .I flex
2788: scans your rule actions to determine whether you use the
2789: .B REJECT
2790: or
2791: .B yymore()
2792: features. The
2793: .B reject
2794: and
2795: .B yymore
2796: options are available to override its decision as to whether you use the
2797: options, either by setting them (e.g.,
2798: .B %option reject)
2799: to indicate the feature is indeed used, or
2800: unsetting them to indicate it actually is not used
2801: (e.g.,
2802: .B %option noyymore).
2803: .PP
2804: Three options take string-delimited values, offset with '=':
2805: .nf
2806:
2807: %option outfile="ABC"
2808:
2809: .fi
2810: is equivalent to
2811: .B -oABC,
2812: and
2813: .nf
2814:
2815: %option prefix="XYZ"
2816:
2817: .fi
2818: is equivalent to
2819: .B -PXYZ.
2820: Finally,
2821: .nf
2822:
2823: %option yyclass="foo"
2824:
2825: .fi
2826: only applies when generating a C++ scanner (
2827: .B \-+
2828: option). It informs
2829: .I flex
2830: that you have derived
2831: .B foo
2832: as a subclass of
2833: .B yyFlexLexer,
2834: so
2835: .I flex
2836: will place your actions in the member function
2837: .B foo::yylex()
2838: instead of
2839: .B yyFlexLexer::yylex().
2840: It also generates a
2841: .B yyFlexLexer::yylex()
2842: member function that emits a run-time error (by invoking
2843: .B yyFlexLexer::LexerError())
2844: if called.
2845: See Generating C++ Scanners, below, for additional information.
2846: .PP
2847: A number of options are available for lint purists who want to suppress
2848: the appearance of unneeded routines in the generated scanner. Each of the
2849: following, if unset
2850: (e.g.,
2851: .B %option nounput
2852: ), results in the corresponding routine not appearing in
2853: the generated scanner:
2854: .nf
2855:
2856: input, unput
2857: yy_push_state, yy_pop_state, yy_top_state
2858: yy_scan_buffer, yy_scan_bytes, yy_scan_string
2859:
2860: .fi
2861: (though
2862: .B yy_push_state()
2863: and friends won't appear anyway unless you use
2864: .B %option stack).
2865: .SH PERFORMANCE CONSIDERATIONS
2866: The main design goal of
2867: .I flex
2868: is that it generate high-performance scanners. It has been optimized
2869: for dealing well with large sets of rules. Aside from the effects on
2870: scanner speed of the table compression
2871: .B \-C
2872: options outlined above,
2873: there are a number of options/actions which degrade performance. These
2874: are, from most expensive to least:
2875: .nf
2876:
2877: REJECT
2878: %option yylineno
2879: arbitrary trailing context
2880:
2881: pattern sets that require backing up
2882: %array
2883: %option interactive
2884: %option always-interactive
2885:
2886: '^' beginning-of-line operator
2887: yymore()
2888:
2889: .fi
2890: with the first three all being quite expensive and the last two
2891: being quite cheap. Note also that
2892: .B unput()
2893: is implemented as a routine call that potentially does quite a bit of
2894: work, while
2895: .B yyless()
2896: is a quite-cheap macro; so if just putting back some excess text you
2897: scanned, use
2898: .B yyless().
2899: .PP
2900: .B REJECT
2901: should be avoided at all costs when performance is important.
2902: It is a particularly expensive option.
2903: .PP
2904: Getting rid of backing up is messy and often may be an enormous
2905: amount of work for a complicated scanner. In principal, one begins
2906: by using the
1.7 aaron 2907: .B \-b
1.1 deraadt 2908: flag to generate a
2909: .I lex.backup
2910: file. For example, on the input
2911: .nf
2912:
2913: %%
2914: foo return TOK_KEYWORD;
2915: foobar return TOK_KEYWORD;
2916:
2917: .fi
2918: the file looks like:
2919: .nf
2920:
2921: State #6 is non-accepting -
2922: associated rule line numbers:
2923: 2 3
2924: out-transitions: [ o ]
2925: jam-transitions: EOF [ \\001-n p-\\177 ]
2926:
2927: State #8 is non-accepting -
2928: associated rule line numbers:
2929: 3
2930: out-transitions: [ a ]
2931: jam-transitions: EOF [ \\001-` b-\\177 ]
2932:
2933: State #9 is non-accepting -
2934: associated rule line numbers:
2935: 3
2936: out-transitions: [ r ]
2937: jam-transitions: EOF [ \\001-q s-\\177 ]
2938:
2939: Compressed tables always back up.
2940:
2941: .fi
2942: The first few lines tell us that there's a scanner state in
2943: which it can make a transition on an 'o' but not on any other
2944: character, and that in that state the currently scanned text does not match
2945: any rule. The state occurs when trying to match the rules found
2946: at lines 2 and 3 in the input file.
2947: If the scanner is in that state and then reads
2948: something other than an 'o', it will have to back up to find
2949: a rule which is matched. With
2950: a bit of headscratching one can see that this must be the
2951: state it's in when it has seen "fo". When this has happened,
2952: if anything other than another 'o' is seen, the scanner will
2953: have to back up to simply match the 'f' (by the default rule).
2954: .PP
2955: The comment regarding State #8 indicates there's a problem
2956: when "foob" has been scanned. Indeed, on any character other
2957: than an 'a', the scanner will have to back up to accept "foo".
2958: Similarly, the comment for State #9 concerns when "fooba" has
2959: been scanned and an 'r' does not follow.
2960: .PP
2961: The final comment reminds us that there's no point going to
2962: all the trouble of removing backing up from the rules unless
2963: we're using
2964: .B \-Cf
2965: or
2966: .B \-CF,
2967: since there's no performance gain doing so with compressed scanners.
2968: .PP
2969: The way to remove the backing up is to add "error" rules:
2970: .nf
2971:
2972: %%
2973: foo return TOK_KEYWORD;
2974: foobar return TOK_KEYWORD;
2975:
2976: fooba |
2977: foob |
2978: fo {
2979: /* false alarm, not really a keyword */
2980: return TOK_ID;
2981: }
2982:
2983: .fi
2984: .PP
2985: Eliminating backing up among a list of keywords can also be
2986: done using a "catch-all" rule:
2987: .nf
2988:
2989: %%
2990: foo return TOK_KEYWORD;
2991: foobar return TOK_KEYWORD;
2992:
2993: [a-z]+ return TOK_ID;
2994:
2995: .fi
2996: This is usually the best solution when appropriate.
2997: .PP
2998: Backing up messages tend to cascade.
2999: With a complicated set of rules it's not uncommon to get hundreds
3000: of messages. If one can decipher them, though, it often
3001: only takes a dozen or so rules to eliminate the backing up (though
3002: it's easy to make a mistake and have an error rule accidentally match
3003: a valid token. A possible future
3004: .I flex
3005: feature will be to automatically add rules to eliminate backing up).
3006: .PP
3007: It's important to keep in mind that you gain the benefits of eliminating
3008: backing up only if you eliminate
3009: .I every
3010: instance of backing up. Leaving just one means you gain nothing.
3011: .PP
3012: .I Variable
3013: trailing context (where both the leading and trailing parts do not have
3014: a fixed length) entails almost the same performance loss as
3015: .B REJECT
3016: (i.e., substantial). So when possible a rule like:
3017: .nf
3018:
3019: %%
3020: mouse|rat/(cat|dog) run();
3021:
3022: .fi
3023: is better written:
3024: .nf
3025:
3026: %%
3027: mouse/cat|dog run();
3028: rat/cat|dog run();
3029:
3030: .fi
3031: or as
3032: .nf
3033:
3034: %%
3035: mouse|rat/cat run();
3036: mouse|rat/dog run();
3037:
3038: .fi
3039: Note that here the special '|' action does
3040: .I not
3041: provide any savings, and can even make things worse (see
3042: Deficiencies / Bugs below).
3043: .LP
3044: Another area where the user can increase a scanner's performance
3045: (and one that's easier to implement) arises from the fact that
3046: the longer the tokens matched, the faster the scanner will run.
3047: This is because with long tokens the processing of most input
3048: characters takes place in the (short) inner scanning loop, and
3049: does not often have to go through the additional work of setting up
3050: the scanning environment (e.g.,
3051: .B yytext)
3052: for the action. Recall the scanner for C comments:
3053: .nf
3054:
3055: %x comment
3056: %%
3057: int line_num = 1;
3058:
3059: "/*" BEGIN(comment);
3060:
3061: <comment>[^*\\n]*
3062: <comment>"*"+[^*/\\n]*
3063: <comment>\\n ++line_num;
3064: <comment>"*"+"/" BEGIN(INITIAL);
3065:
3066: .fi
3067: This could be sped up by writing it as:
3068: .nf
3069:
3070: %x comment
3071: %%
3072: int line_num = 1;
3073:
3074: "/*" BEGIN(comment);
3075:
3076: <comment>[^*\\n]*
3077: <comment>[^*\\n]*\\n ++line_num;
3078: <comment>"*"+[^*/\\n]*
3079: <comment>"*"+[^*/\\n]*\\n ++line_num;
3080: <comment>"*"+"/" BEGIN(INITIAL);
3081:
3082: .fi
3083: Now instead of each newline requiring the processing of another
3084: action, recognizing the newlines is "distributed" over the other rules
3085: to keep the matched text as long as possible. Note that
3086: .I adding
3087: rules does
3088: .I not
3089: slow down the scanner! The speed of the scanner is independent
3090: of the number of rules or (modulo the considerations given at the
3091: beginning of this section) how complicated the rules are with
3092: regard to operators such as '*' and '|'.
3093: .PP
3094: A final example in speeding up a scanner: suppose you want to scan
3095: through a file containing identifiers and keywords, one per line
3096: and with no other extraneous characters, and recognize all the
3097: keywords. A natural first approach is:
3098: .nf
3099:
3100: %%
3101: asm |
3102: auto |
3103: break |
3104: ... etc ...
3105: volatile |
3106: while /* it's a keyword */
3107:
3108: .|\\n /* it's not a keyword */
3109:
3110: .fi
3111: To eliminate the back-tracking, introduce a catch-all rule:
3112: .nf
3113:
3114: %%
3115: asm |
3116: auto |
3117: break |
3118: ... etc ...
3119: volatile |
3120: while /* it's a keyword */
3121:
3122: [a-z]+ |
3123: .|\\n /* it's not a keyword */
3124:
3125: .fi
3126: Now, if it's guaranteed that there's exactly one word per line,
3127: then we can reduce the total number of matches by a half by
3128: merging in the recognition of newlines with that of the other
3129: tokens:
3130: .nf
3131:
3132: %%
3133: asm\\n |
3134: auto\\n |
3135: break\\n |
3136: ... etc ...
3137: volatile\\n |
3138: while\\n /* it's a keyword */
3139:
3140: [a-z]+\\n |
3141: .|\\n /* it's not a keyword */
3142:
3143: .fi
3144: One has to be careful here, as we have now reintroduced backing up
3145: into the scanner. In particular, while
3146: .I we
3147: know that there will never be any characters in the input stream
3148: other than letters or newlines,
3149: .I flex
3150: can't figure this out, and it will plan for possibly needing to back up
3151: when it has scanned a token like "auto" and then the next character
3152: is something other than a newline or a letter. Previously it would
3153: then just match the "auto" rule and be done, but now it has no "auto"
1.10 deraadt 3154: rule, only an "auto\\n" rule. To eliminate the possibility of backing up,
1.1 deraadt 3155: we could either duplicate all rules but without final newlines, or,
3156: since we never expect to encounter such an input and therefore don't
3157: how it's classified, we can introduce one more catch-all rule, this
3158: one which doesn't include a newline:
3159: .nf
3160:
3161: %%
3162: asm\\n |
3163: auto\\n |
3164: break\\n |
3165: ... etc ...
3166: volatile\\n |
3167: while\\n /* it's a keyword */
3168:
3169: [a-z]+\\n |
3170: [a-z]+ |
3171: .|\\n /* it's not a keyword */
3172:
3173: .fi
3174: Compiled with
3175: .B \-Cf,
3176: this is about as fast as one can get a
1.7 aaron 3177: .I flex
1.1 deraadt 3178: scanner to go for this particular problem.
3179: .PP
3180: A final note:
3181: .I flex
3182: is slow when matching NUL's, particularly when a token contains
3183: multiple NUL's.
3184: It's best to write rules which match
3185: .I short
3186: amounts of text if it's anticipated that the text will often include NUL's.
3187: .PP
3188: Another final note regarding performance: as mentioned above in the section
3189: How the Input is Matched, dynamically resizing
3190: .B yytext
3191: to accommodate huge tokens is a slow process because it presently requires that
3192: the (huge) token be rescanned from the beginning. Thus if performance is
3193: vital, you should attempt to match "large" quantities of text but not
3194: "huge" quantities, where the cutoff between the two is at about 8K
3195: characters/token.
3196: .SH GENERATING C++ SCANNERS
3197: .I flex
3198: provides two different ways to generate scanners for use with C++. The
3199: first way is to simply compile a scanner generated by
3200: .I flex
3201: using a C++ compiler instead of a C compiler. You should not encounter
1.10 deraadt 3202: any compilation errors (please report any you find to the email address
1.1 deraadt 3203: given in the Author section below). You can then use C++ code in your
3204: rule actions instead of C code. Note that the default input source for
3205: your scanner remains
3206: .I yyin,
3207: and default echoing is still done to
3208: .I yyout.
3209: Both of these remain
3210: .I FILE *
3211: variables and not C++
3212: .I streams.
3213: .PP
3214: You can also use
3215: .I flex
3216: to generate a C++ scanner class, using the
3217: .B \-+
3218: option (or, equivalently,
3219: .B %option c++),
3220: which is automatically specified if the name of the flex
3221: executable ends in a '+', such as
3222: .I flex++.
3223: When using this option, flex defaults to generating the scanner to the file
3224: .B lex.yy.cc
3225: instead of
3226: .B lex.yy.c.
3227: The generated scanner includes the header file
1.5 deraadt 3228: .I g++/FlexLexer.h,
1.1 deraadt 3229: which defines the interface to two C++ classes.
3230: .PP
3231: The first class,
3232: .B FlexLexer,
3233: provides an abstract base class defining the general scanner class
3234: interface. It provides the following member functions:
3235: .TP
3236: .B const char* YYText()
3237: returns the text of the most recently matched token, the equivalent of
3238: .B yytext.
3239: .TP
3240: .B int YYLeng()
3241: returns the length of the most recently matched token, the equivalent of
3242: .B yyleng.
3243: .TP
3244: .B int lineno() const
3245: returns the current input line number
3246: (see
3247: .B %option yylineno),
3248: or
3249: .B 1
3250: if
3251: .B %option yylineno
3252: was not used.
3253: .TP
3254: .B void set_debug( int flag )
3255: sets the debugging flag for the scanner, equivalent to assigning to
3256: .B yy_flex_debug
3257: (see the Options section above). Note that you must build the scanner
3258: using
3259: .B %option debug
3260: to include debugging information in it.
3261: .TP
3262: .B int debug() const
3263: returns the current setting of the debugging flag.
3264: .PP
3265: Also provided are member functions equivalent to
3266: .B yy_switch_to_buffer(),
3267: .B yy_create_buffer()
3268: (though the first argument is an
3269: .B istream*
3270: object pointer and not a
3271: .B FILE*),
3272: .B yy_flush_buffer(),
3273: .B yy_delete_buffer(),
3274: and
3275: .B yyrestart()
1.10 deraadt 3276: (again, the first argument is an
1.1 deraadt 3277: .B istream*
3278: object pointer).
3279: .PP
3280: The second class defined in
1.5 deraadt 3281: .I g++/FlexLexer.h
1.1 deraadt 3282: is
3283: .B yyFlexLexer,
3284: which is derived from
3285: .B FlexLexer.
3286: It defines the following additional member functions:
3287: .TP
3288: .B
3289: yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
3290: constructs a
3291: .B yyFlexLexer
3292: object using the given streams for input and output. If not specified,
3293: the streams default to
3294: .B cin
3295: and
3296: .B cout,
3297: respectively.
3298: .TP
3299: .B virtual int yylex()
1.10 deraadt 3300: performs the same role as
1.1 deraadt 3301: .B yylex()
3302: does for ordinary flex scanners: it scans the input stream, consuming
3303: tokens, until a rule's action returns a value. If you derive a subclass
3304: .B S
3305: from
3306: .B yyFlexLexer
3307: and want to access the member functions and variables of
3308: .B S
3309: inside
3310: .B yylex(),
3311: then you need to use
3312: .B %option yyclass="S"
3313: to inform
3314: .I flex
3315: that you will be using that subclass instead of
3316: .B yyFlexLexer.
3317: In this case, rather than generating
3318: .B yyFlexLexer::yylex(),
3319: .I flex
3320: generates
3321: .B S::yylex()
3322: (and also generates a dummy
3323: .B yyFlexLexer::yylex()
3324: that calls
3325: .B yyFlexLexer::LexerError()
3326: if called).
3327: .TP
3328: .B
3329: virtual void switch_streams(istream* new_in = 0,
3330: .B
3331: ostream* new_out = 0)
3332: reassigns
3333: .B yyin
3334: to
3335: .B new_in
3336: (if non-nil)
3337: and
3338: .B yyout
3339: to
3340: .B new_out
3341: (ditto), deleting the previous input buffer if
3342: .B yyin
3343: is reassigned.
3344: .TP
3345: .B
3346: int yylex( istream* new_in, ostream* new_out = 0 )
3347: first switches the input streams via
3348: .B switch_streams( new_in, new_out )
3349: and then returns the value of
3350: .B yylex().
3351: .PP
3352: In addition,
3353: .B yyFlexLexer
3354: defines the following protected virtual functions which you can redefine
3355: in derived classes to tailor the scanner:
3356: .TP
3357: .B
3358: virtual int LexerInput( char* buf, int max_size )
3359: reads up to
3360: .B max_size
3361: characters into
3362: .B buf
3363: and returns the number of characters read. To indicate end-of-input,
3364: return 0 characters. Note that "interactive" scanners (see the
3365: .B \-B
3366: and
3367: .B \-I
3368: flags) define the macro
3369: .B YY_INTERACTIVE.
3370: If you redefine
3371: .B LexerInput()
3372: and need to take different actions depending on whether or not
3373: the scanner might be scanning an interactive input source, you can
3374: test for the presence of this name via
3375: .B #ifdef.
3376: .TP
3377: .B
3378: virtual void LexerOutput( const char* buf, int size )
3379: writes out
3380: .B size
3381: characters from the buffer
3382: .B buf,
3383: which, while NUL-terminated, may also contain "internal" NUL's if
3384: the scanner's rules can match text with NUL's in them.
3385: .TP
3386: .B
3387: virtual void LexerError( const char* msg )
3388: reports a fatal error message. The default version of this function
3389: writes the message to the stream
3390: .B cerr
3391: and exits.
3392: .PP
3393: Note that a
3394: .B yyFlexLexer
3395: object contains its
3396: .I entire
3397: scanning state. Thus you can use such objects to create reentrant
3398: scanners. You can instantiate multiple instances of the same
3399: .B yyFlexLexer
3400: class, and you can also combine multiple C++ scanner classes together
3401: in the same program using the
3402: .B \-P
3403: option discussed above.
3404: .PP
3405: Finally, note that the
3406: .B %array
3407: feature is not available to C++ scanner classes; you must use
3408: .B %pointer
3409: (the default).
3410: .PP
3411: Here is an example of a simple C++ scanner:
3412: .nf
3413:
3414: // An example of using the flex C++ scanner class.
3415:
3416: %{
3417: int mylineno = 0;
3418: %}
3419:
3420: string \\"[^\\n"]+\\"
3421:
3422: ws [ \\t]+
3423:
3424: alpha [A-Za-z]
3425: dig [0-9]
3426: name ({alpha}|{dig}|\\$)({alpha}|{dig}|[_.\\-/$])*
3427: num1 [-+]?{dig}+\\.?([eE][-+]?{dig}+)?
3428: num2 [-+]?{dig}*\\.{dig}+([eE][-+]?{dig}+)?
3429: number {num1}|{num2}
3430:
3431: %%
3432:
3433: {ws} /* skip blanks and tabs */
3434:
3435: "/*" {
3436: int c;
3437:
3438: while((c = yyinput()) != 0)
3439: {
3440: if(c == '\\n')
3441: ++mylineno;
3442:
3443: else if(c == '*')
3444: {
3445: if((c = yyinput()) == '/')
3446: break;
3447: else
3448: unput(c);
3449: }
3450: }
3451: }
3452:
3453: {number} cout << "number " << YYText() << '\\n';
3454:
3455: \\n mylineno++;
3456:
3457: {name} cout << "name " << YYText() << '\\n';
3458:
3459: {string} cout << "string " << YYText() << '\\n';
3460:
3461: %%
3462:
3463: int main( int /* argc */, char** /* argv */ )
3464: {
3465: FlexLexer* lexer = new yyFlexLexer;
3466: while(lexer->yylex() != 0)
3467: ;
3468: return 0;
3469: }
3470: .fi
3471: If you want to create multiple (different) lexer classes, you use the
3472: .B \-P
3473: flag (or the
3474: .B prefix=
3475: option) to rename each
3476: .B yyFlexLexer
3477: to some other
3478: .B xxFlexLexer.
3479: You then can include
1.5 deraadt 3480: .B <g++/FlexLexer.h>
1.1 deraadt 3481: in your other sources once per lexer class, first renaming
3482: .B yyFlexLexer
3483: as follows:
3484: .nf
3485:
3486: #undef yyFlexLexer
3487: #define yyFlexLexer xxFlexLexer
1.5 deraadt 3488: #include <g++/FlexLexer.h>
1.1 deraadt 3489:
3490: #undef yyFlexLexer
3491: #define yyFlexLexer zzFlexLexer
1.5 deraadt 3492: #include <g++/FlexLexer.h>
1.1 deraadt 3493:
3494: .fi
3495: if, for example, you used
3496: .B %option prefix="xx"
3497: for one of your scanners and
3498: .B %option prefix="zz"
3499: for the other.
3500: .PP
3501: IMPORTANT: the present form of the scanning class is
3502: .I experimental
1.7 aaron 3503: and may change considerably between major releases.
1.1 deraadt 3504: .SH INCOMPATIBILITIES WITH LEX AND POSIX
3505: .I flex
3506: is a rewrite of the AT&T Unix
3507: .I lex
3508: tool (the two implementations do not share any code, though),
3509: with some extensions and incompatibilities, both of which
3510: are of concern to those who wish to write scanners acceptable
3511: to either implementation. Flex is fully compliant with the POSIX
3512: .I lex
3513: specification, except that when using
3514: .B %pointer
3515: (the default), a call to
3516: .B unput()
3517: destroys the contents of
3518: .B yytext,
3519: which is counter to the POSIX specification.
3520: .PP
3521: In this section we discuss all of the known areas of incompatibility
3522: between flex, AT&T lex, and the POSIX specification.
3523: .PP
3524: .I flex's
3525: .B \-l
3526: option turns on maximum compatibility with the original AT&T
3527: .I lex
3528: implementation, at the cost of a major loss in the generated scanner's
3529: performance. We note below which incompatibilities can be overcome
3530: using the
3531: .B \-l
3532: option.
3533: .PP
3534: .I flex
3535: is fully compatible with
3536: .I lex
3537: with the following exceptions:
3538: .IP -
3539: The undocumented
3540: .I lex
3541: scanner internal variable
3542: .B yylineno
3543: is not supported unless
3544: .B \-l
3545: or
3546: .B %option yylineno
3547: is used.
3548: .IP
3549: .B yylineno
3550: should be maintained on a per-buffer basis, rather than a per-scanner
3551: (single global variable) basis.
3552: .IP
3553: .B yylineno
3554: is not part of the POSIX specification.
3555: .IP -
3556: The
3557: .B input()
3558: routine is not redefinable, though it may be called to read characters
3559: following whatever has been matched by a rule. If
3560: .B input()
3561: encounters an end-of-file the normal
3562: .B yywrap()
3563: processing is done. A ``real'' end-of-file is returned by
3564: .B input()
3565: as
3566: .I EOF.
3567: .IP
3568: Input is instead controlled by defining the
3569: .B YY_INPUT
3570: macro.
3571: .IP
3572: The
3573: .I flex
3574: restriction that
3575: .B input()
3576: cannot be redefined is in accordance with the POSIX specification,
3577: which simply does not specify any way of controlling the
3578: scanner's input other than by making an initial assignment to
3579: .I yyin.
3580: .IP -
3581: The
3582: .B unput()
3583: routine is not redefinable. This restriction is in accordance with POSIX.
3584: .IP -
3585: .I flex
3586: scanners are not as reentrant as
3587: .I lex
3588: scanners. In particular, if you have an interactive scanner and
3589: an interrupt handler which long-jumps out of the scanner, and
3590: the scanner is subsequently called again, you may get the following
3591: message:
3592: .nf
3593:
3594: fatal flex scanner internal error--end of buffer missed
3595:
3596: .fi
3597: To reenter the scanner, first use
3598: .nf
3599:
3600: yyrestart( yyin );
3601:
3602: .fi
3603: Note that this call will throw away any buffered input; usually this
3604: isn't a problem with an interactive scanner.
3605: .IP
3606: Also note that flex C++ scanner classes
3607: .I are
3608: reentrant, so if using C++ is an option for you, you should use
3609: them instead. See "Generating C++ Scanners" above for details.
3610: .IP -
3611: .B output()
3612: is not supported.
3613: Output from the
3614: .B ECHO
3615: macro is done to the file-pointer
3616: .I yyout
3617: (default
3618: .I stdout).
3619: .IP
3620: .B output()
3621: is not part of the POSIX specification.
3622: .IP -
3623: .I lex
3624: does not support exclusive start conditions (%x), though they
3625: are in the POSIX specification.
3626: .IP -
3627: When definitions are expanded,
3628: .I flex
3629: encloses them in parentheses.
3630: With lex, the following:
3631: .nf
3632:
3633: NAME [A-Z][A-Z0-9]*
3634: %%
3635: foo{NAME}? printf( "Found it\\n" );
3636: %%
3637:
3638: .fi
3639: will not match the string "foo" because when the macro
3640: is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
3641: and the precedence is such that the '?' is associated with
3642: "[A-Z0-9]*". With
3643: .I flex,
3644: the rule will be expanded to
3645: "foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
3646: .IP
3647: Note that if the definition begins with
3648: .B ^
3649: or ends with
3650: .B $
3651: then it is
3652: .I not
3653: expanded with parentheses, to allow these operators to appear in
3654: definitions without losing their special meanings. But the
3655: .B <s>, /,
3656: and
3657: .B <<EOF>>
3658: operators cannot be used in a
3659: .I flex
3660: definition.
3661: .IP
3662: Using
3663: .B \-l
3664: results in the
3665: .I lex
3666: behavior of no parentheses around the definition.
3667: .IP
3668: The POSIX specification is that the definition be enclosed in parentheses.
3669: .IP -
3670: Some implementations of
3671: .I lex
3672: allow a rule's action to begin on a separate line, if the rule's pattern
3673: has trailing whitespace:
3674: .nf
3675:
3676: %%
3677: foo|bar<space here>
3678: { foobar_action(); }
3679:
3680: .fi
3681: .I flex
3682: does not support this feature.
3683: .IP -
3684: The
3685: .I lex
3686: .B %r
3687: (generate a Ratfor scanner) option is not supported. It is not part
3688: of the POSIX specification.
3689: .IP -
3690: After a call to
3691: .B unput(),
3692: .I yytext
3693: is undefined until the next token is matched, unless the scanner
3694: was built using
3695: .B %array.
3696: This is not the case with
3697: .I lex
3698: or the POSIX specification. The
3699: .B \-l
3700: option does away with this incompatibility.
3701: .IP -
3702: The precedence of the
3703: .B {}
3704: (numeric range) operator is different.
3705: .I lex
3706: interprets "abc{1,3}" as "match one, two, or
3707: three occurrences of 'abc'", whereas
3708: .I flex
3709: interprets it as "match 'ab'
3710: followed by one, two, or three occurrences of 'c'". The latter is
3711: in agreement with the POSIX specification.
3712: .IP -
3713: The precedence of the
3714: .B ^
3715: operator is different.
3716: .I lex
3717: interprets "^foo|bar" as "match either 'foo' at the beginning of a line,
3718: or 'bar' anywhere", whereas
3719: .I flex
3720: interprets it as "match either 'foo' or 'bar' if they come at the beginning
3721: of a line". The latter is in agreement with the POSIX specification.
3722: .IP -
3723: The special table-size declarations such as
3724: .B %a
3725: supported by
3726: .I lex
3727: are not required by
3728: .I flex
3729: scanners;
3730: .I flex
3731: ignores them.
3732: .IP -
3733: The name
3734: .bd
3735: FLEX_SCANNER
3736: is #define'd so scanners may be written for use with either
3737: .I flex
3738: or
3739: .I lex.
3740: Scanners also include
3741: .B YY_FLEX_MAJOR_VERSION
3742: and
3743: .B YY_FLEX_MINOR_VERSION
3744: indicating which version of
3745: .I flex
3746: generated the scanner
3747: (for example, for the 2.5 release, these defines would be 2 and 5
3748: respectively).
3749: .PP
3750: The following
3751: .I flex
3752: features are not included in
3753: .I lex
3754: or the POSIX specification:
3755: .nf
3756:
3757: C++ scanners
3758: %option
3759: start condition scopes
3760: start condition stacks
3761: interactive/non-interactive scanners
3762: yy_scan_string() and friends
3763: yyterminate()
3764: yy_set_interactive()
3765: yy_set_bol()
3766: YY_AT_BOL()
3767: <<EOF>>
3768: <*>
3769: YY_DECL
3770: YY_START
3771: YY_USER_ACTION
3772: YY_USER_INIT
3773: #line directives
3774: %{}'s around actions
3775: multiple actions on a line
3776:
3777: .fi
3778: plus almost all of the flex flags.
3779: The last feature in the list refers to the fact that with
3780: .I flex
3781: you can put multiple actions on the same line, separated with
3782: semi-colons, while with
3783: .I lex,
3784: the following
3785: .nf
3786:
3787: foo handle_foo(); ++num_foos_seen;
3788:
3789: .fi
3790: is (rather surprisingly) truncated to
3791: .nf
3792:
3793: foo handle_foo();
3794:
3795: .fi
3796: .I flex
3797: does not truncate the action. Actions that are not enclosed in
3798: braces are simply terminated at the end of the line.
3799: .SH DIAGNOSTICS
3800: .PP
3801: .I warning, rule cannot be matched
3802: indicates that the given rule
3803: cannot be matched because it follows other rules that will
3804: always match the same text as it. For
3805: example, in the following "foo" cannot be matched because it comes after
3806: an identifier "catch-all" rule:
3807: .nf
3808:
3809: [a-z]+ got_identifier();
3810: foo got_foo();
3811:
3812: .fi
3813: Using
3814: .B REJECT
3815: in a scanner suppresses this warning.
3816: .PP
3817: .I warning,
3818: .B \-s
3819: .I
3820: option given but default rule can be matched
3821: means that it is possible (perhaps only in a particular start condition)
3822: that the default rule (match any single character) is the only one
3823: that will match a particular input. Since
3824: .B \-s
3825: was given, presumably this is not intended.
3826: .PP
3827: .I reject_used_but_not_detected undefined
3828: or
3829: .I yymore_used_but_not_detected undefined -
3830: These errors can occur at compile time. They indicate that the
3831: scanner uses
3832: .B REJECT
3833: or
3834: .B yymore()
3835: but that
3836: .I flex
3837: failed to notice the fact, meaning that
3838: .I flex
3839: scanned the first two sections looking for occurrences of these actions
1.10 deraadt 3840: and failed to find any, but somehow you snuck some in (via an #include
1.1 deraadt 3841: file, for example). Use
3842: .B %option reject
3843: or
3844: .B %option yymore
3845: to indicate to flex that you really do use these features.
3846: .PP
3847: .I flex scanner jammed -
3848: a scanner compiled with
3849: .B \-s
3850: has encountered an input string which wasn't matched by
3851: any of its rules. This error can also occur due to internal problems.
3852: .PP
3853: .I token too large, exceeds YYLMAX -
3854: your scanner uses
3855: .B %array
3856: and one of its rules matched a string longer than the
3857: .B YYLMAX
3858: constant (8K bytes by default). You can increase the value by
3859: #define'ing
3860: .B YYLMAX
3861: in the definitions section of your
3862: .I flex
3863: input.
3864: .PP
3865: .I scanner requires \-8 flag to
3866: .I use the character 'x' -
3867: Your scanner specification includes recognizing the 8-bit character
3868: .I 'x'
3869: and you did not specify the \-8 flag, and your scanner defaulted to 7-bit
3870: because you used the
3871: .B \-Cf
3872: or
3873: .B \-CF
3874: table compression options. See the discussion of the
3875: .B \-7
3876: flag for details.
3877: .PP
3878: .I flex scanner push-back overflow -
3879: you used
3880: .B unput()
3881: to push back so much text that the scanner's buffer could not hold
3882: both the pushed-back text and the current token in
3883: .B yytext.
3884: Ideally the scanner should dynamically resize the buffer in this case, but at
3885: present it does not.
3886: .PP
3887: .I
3888: input buffer overflow, can't enlarge buffer because scanner uses REJECT -
3889: the scanner was working on matching an extremely large token and needed
3890: to expand the input buffer. This doesn't work with scanners that use
3891: .B
3892: REJECT.
3893: .PP
3894: .I
3895: fatal flex scanner internal error--end of buffer missed -
3896: This can occur in an scanner which is reentered after a long-jump
3897: has jumped out (or over) the scanner's activation frame. Before
3898: reentering the scanner, use:
3899: .nf
3900:
3901: yyrestart( yyin );
3902:
3903: .fi
3904: or, as noted above, switch to using the C++ scanner class.
3905: .PP
3906: .I too many start conditions in <> construct! -
3907: you listed more start conditions in a <> construct than exist (so
3908: you must have listed at least one of them twice).
3909: .SH FILES
3910: .TP
3911: .B \-lfl
3912: library with which scanners must be linked.
3913: .TP
3914: .I lex.yy.c
3915: generated scanner (called
3916: .I lexyy.c
3917: on some systems).
3918: .TP
3919: .I lex.yy.cc
3920: generated C++ scanner class, when using
3921: .B -+.
3922: .TP
1.5 deraadt 3923: .I <g++/FlexLexer.h>
1.1 deraadt 3924: header file defining the C++ scanner base class,
3925: .B FlexLexer,
3926: and its derived class,
3927: .B yyFlexLexer.
3928: .TP
3929: .I flex.skl
3930: skeleton scanner. This file is only used when building flex, not when
3931: flex executes.
3932: .TP
3933: .I lex.backup
3934: backing-up information for
3935: .B \-b
3936: flag (called
3937: .I lex.bck
3938: on some systems).
3939: .SH DEFICIENCIES / BUGS
3940: .PP
3941: Some trailing context
3942: patterns cannot be properly matched and generate
3943: warning messages ("dangerous trailing context"). These are
3944: patterns where the ending of the
3945: first part of the rule matches the beginning of the second
3946: part, such as "zx*/xy*", where the 'x*' matches the 'x' at
3947: the beginning of the trailing context. (Note that the POSIX draft
3948: states that the text matched by such patterns is undefined.)
3949: .PP
3950: For some trailing context rules, parts which are actually fixed-length are
1.3 deraadt 3951: not recognized as such, leading to the above mentioned performance loss.
1.1 deraadt 3952: In particular, parts using '|' or {n} (such as "foo{3}") are always
3953: considered variable-length.
3954: .PP
3955: Combining trailing context with the special '|' action can result in
3956: .I fixed
3957: trailing context being turned into the more expensive
3958: .I variable
3959: trailing context. For example, in the following:
3960: .nf
3961:
3962: %%
3963: abc |
3964: xyz/def
3965:
3966: .fi
3967: .PP
3968: Use of
3969: .B unput()
3970: invalidates yytext and yyleng, unless the
3971: .B %array
3972: directive
3973: or the
3974: .B \-l
3975: option has been used.
3976: .PP
3977: Pattern-matching of NUL's is substantially slower than matching other
3978: characters.
3979: .PP
3980: Dynamic resizing of the input buffer is slow, as it entails rescanning
3981: all the text matched so far by the current (generally huge) token.
3982: .PP
3983: Due to both buffering of input and read-ahead, you cannot intermix
3984: calls to <stdio.h> routines, such as, for example,
3985: .B getchar(),
3986: with
3987: .I flex
3988: rules and expect it to work. Call
3989: .B input()
3990: instead.
3991: .PP
3992: The total table entries listed by the
3993: .B \-v
3994: flag excludes the number of table entries needed to determine
3995: what rule has been matched. The number of entries is equal
3996: to the number of DFA states if the scanner does not use
3997: .B REJECT,
3998: and somewhat greater than the number of states if it does.
3999: .PP
4000: .B REJECT
4001: cannot be used with the
4002: .B \-f
4003: or
4004: .B \-F
4005: options.
4006: .PP
4007: The
4008: .I flex
4009: internal algorithms need documentation.
4010: .SH SEE ALSO
4011: .PP
4012: lex(1), yacc(1), sed(1), awk(1).
4013: .PP
4014: John Levine, Tony Mason, and Doug Brown,
4015: .I Lex & Yacc,
4016: O'Reilly and Associates. Be sure to get the 2nd edition.
4017: .PP
4018: M. E. Lesk and E. Schmidt,
4019: .I LEX \- Lexical Analyzer Generator
4020: .PP
4021: Alfred Aho, Ravi Sethi and Jeffrey Ullman,
4022: .I Compilers: Principles, Techniques and Tools,
4023: Addison-Wesley (1986). Describes the pattern-matching techniques used by
4024: .I flex
4025: (deterministic finite automata).
4026: .SH AUTHOR
4027: Vern Paxson, with the help of many ideas and much inspiration from
4028: Van Jacobson. Original version by Jef Poskanzer. The fast table
4029: representation is a partial implementation of a design done by Van
4030: Jacobson. The implementation was done by Kevin Gong and Vern Paxson.
4031: .PP
4032: Thanks to the many
4033: .I flex
4034: beta-testers, feedbackers, and contributors, especially Francois Pinard,
4035: Casey Leedom,
4036: Robert Abramovitz,
4037: Stan Adermann, Terry Allen, David Barker-Plummer, John Basrai,
4038: Neal Becker, Nelson H.F. Beebe, benson@odi.com,
4039: Karl Berry, Peter A. Bigot, Simon Blanchard,
4040: Keith Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick Christopher,
4041: Brian Clapper, J.T. Conklin,
4042: Jason Coughlin, Bill Cox, Nick Cropper, Dave Curtis, Scott David
1.11 deraadt 4043: Daniels, Chris G. Demetriou, Theo de Raadt,
1.1 deraadt 4044: Mike Donahue, Chuck Doucette, Tom Epperly, Leo Eskin,
4045: Chris Faylor, Chris Flatters, Jon Forrest, Jeffrey Friedl,
4046: Joe Gayda, Kaveh R. Ghazi, Wolfgang Glunz,
4047: Eric Goldman, Christopher M. Gould, Ulrich Grepel, Peer Griebel,
4048: Jan Hajic, Charles Hemphill, NORO Hideo,
4049: Jarkko Hietaniemi, Scott Hofmann,
4050: Jeff Honig, Dana Hudes, Eric Hughes, John Interrante,
4051: Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones,
4052: Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane,
4053: Amir Katz, ken@ken.hilco.com, Kevin B. Kenny,
4054: Steve Kirsch, Winfried Koenig, Marq Kole, Ronald Lamprecht,
4055: Greg Lee, Rohan Lenard, Craig Leres, John Levine, Steve Liddle,
4056: David Loffredo, Mike Long,
4057: Mohamed el Lozy, Brian Madsen, Malte, Joe Marshall,
4058: Bengt Martensson, Chris Metcalf,
4059: Luke Mewburn, Jim Meyering, R. Alexander Milowski, Erik Naggum,
4060: G.T. Nicol, Landon Noll, James Nordby, Marc Nozell,
4061: Richard Ohnemus, Karsten Pahnke,
4062: Sven Panne, Roland Pesch, Walter Pelissero, Gaumond
4063: Pierre, Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha,
4064: Frederic Raimbault, Pat Rankin, Rick Richardson,
4065: Kevin Rodgers, Kai Uwe Rommel, Jim Roskind, Alberto Santini,
4066: Andreas Scherer, Darrell Schiebel, Raf Schietekat,
4067: Doug Schmidt, Philippe Schnoebelen, Andreas Schwab,
4068: Larry Schwimmer, Alex Siegel, Eckehard Stolz, Jan-Erik Strvmquist,
4069: Mike Stump, Paul Stuart, Dave Tallman, Ian Lance Taylor,
4070: Chris Thewalt, Richard M. Timoney, Jodi Tsai,
4071: Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams, Ken
4072: Yap, Ron Zellar, Nathan Zelle, David Zuhn,
4073: and those whose names have slipped my marginal
4074: mail-archiving skills but whose contributions are appreciated all the
4075: same.
4076: .PP
4077: Thanks to Keith Bostic, Jon Forrest, Noah Friedman,
4078: John Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T.
4079: Nicol, Francois Pinard, Rich Salz, and Richard Stallman for help with various
4080: distribution headaches.
4081: .PP
4082: Thanks to Esmond Pitt and Earle Horton for 8-bit character support; to
4083: Benson Margulies and Fred Burke for C++ support; to Kent Williams and Tom
4084: Epperly for C++ class support; to Ove Ewerlid for support of NUL's; and to
4085: Eric Hughes for support of multiple buffers.
4086: .PP
4087: This work was primarily done when I was with the Real Time Systems Group
4088: at the Lawrence Berkeley Laboratory in Berkeley, CA. Many thanks to all there
4089: for the support I received.
4090: .PP
4091: Send comments to vern@ee.lbl.gov.