src/usr.bin/lex/flex.1 - annotate

Return to flex.1 CVS log
Up to [local] / src / usr.bin / lex
Annotation of src/usr.bin/lex/flex.1, Revision 1.6

1.6     ! aaron       1: .\"    $OpenBSD: flex.1,v 1.5 1998/08/17 03:20:23 deraadt Exp $
1.2       deraadt     2: .\"
1.1       deraadt     3: .TH FLEX 1 "April 1995" "Version 2.5"
                      4: .SH NAME
                      5: flex \- fast lexical analyzer generator
                      6: .SH SYNOPSIS
                      7: .B flex
                      8: .B [\-bcdfhilnpstvwBFILTV78+? \-C[aefFmr] \-ooutput \-Pprefix \-Sskeleton]
                      9: .B [\-\-help \-\-version]
                     10: .I [filename ...]
                     11: .SH OVERVIEW
                     12: This manual describes
                     13: .I flex,
                     14: a tool for generating programs that perform pattern-matching on text.  The
                     15: manual includes both tutorial and reference sections:
                     16: .nf
                     17:
                     18:     Description
                     19:         a brief overview of the tool
                     20:
                     21:     Some Simple Examples
                     22:
                     23:     Format Of The Input File
                     24:
                     25:     Patterns
                     26:         the extended regular expressions used by flex
                     27:
                     28:     How The Input Is Matched
                     29:         the rules for determining what has been matched
                     30:
                     31:     Actions
                     32:         how to specify what to do when a pattern is matched
                     33:
                     34:     The Generated Scanner
                     35:         details regarding the scanner that flex produces;
                     36:         how to control the input source
                     37:
                     38:     Start Conditions
                     39:         introducing context into your scanners, and
                     40:         managing "mini-scanners"
                     41:
                     42:     Multiple Input Buffers
                     43:         how to manipulate multiple input sources; how to
                     44:         scan from strings instead of files
                     45:
                     46:     End-of-file Rules
                     47:         special rules for matching the end of the input
                     48:
                     49:     Miscellaneous Macros
                     50:         a summary of macros available to the actions
                     51:
                     52:     Values Available To The User
                     53:         a summary of values available to the actions
                     54:
                     55:     Interfacing With Yacc
                     56:         connecting flex scanners together with yacc parsers
                     57:
                     58:     Options
                     59:         flex command-line options, and the "%option"
                     60:         directive
                     61:
                     62:     Performance Considerations
                     63:         how to make your scanner go as fast as possible
                     64:
                     65:     Generating C++ Scanners
                     66:         the (experimental) facility for generating C++
                     67:         scanner classes
                     68:
                     69:     Incompatibilities With Lex And POSIX
                     70:         how flex differs from AT&T lex and the POSIX lex
                     71:         standard
                     72:
                     73:     Diagnostics
                     74:         those error messages produced by flex (or scanners
                     75:         it generates) whose meanings might not be apparent
                     76:
                     77:     Files
                     78:         files used by flex
                     79:
                     80:     Deficiencies / Bugs
                     81:         known problems with flex
                     82:
                     83:     See Also
                     84:         other documentation, related tools
                     85:
                     86:     Author
                     87:         includes contact information
                     88:
                     89: .fi
                     90: .SH DESCRIPTION
                     91: .I flex
                     92: is a tool for generating
                     93: .I scanners:
                     94: programs which recognized lexical patterns in text.
                     95: .I flex
                     96: reads
                     97: the given input files, or its standard input if no file names are given,
                     98: for a description of a scanner to generate.  The description is in
                     99: the form of pairs
                    100: of regular expressions and C code, called
                    101: .I rules.  flex
                    102: generates as output a C source file,
                    103: .B lex.yy.c,
                    104: which defines a routine
                    105: .B yylex().
                    106: This file is compiled and linked with the
                    107: .B \-lfl
                    108: library to produce an executable.  When the executable is run,
                    109: it analyzes its input for occurrences
                    110: of the regular expressions.  Whenever it finds one, it executes
                    111: the corresponding C code.
                    112: .SH SOME SIMPLE EXAMPLES
                    113: .PP
                    114: First some simple examples to get the flavor of how one uses
                    115: .I flex.
                    116: The following
                    117: .I flex
                    118: input specifies a scanner which whenever it encounters the string
                    119: "username" will replace it with the user's login name:
                    120: .nf
                    121:
                    122:     %%
                    123:     username    printf( "%s", getlogin() );
                    124:
                    125: .fi
                    126: By default, any text not matched by a
                    127: .I flex
                    128: scanner
                    129: is copied to the output, so the net effect of this scanner is
                    130: to copy its input file to its output with each occurrence
                    131: of "username" expanded.
                    132: In this input, there is just one rule.  "username" is the
                    133: .I pattern
                    134: and the "printf" is the
                    135: .I action.
                    136: The "%%" marks the beginning of the rules.
                    137: .PP
                    138: Here's another simple example:
                    139: .nf
                    140:
                    141:             int num_lines = 0, num_chars = 0;
                    142:
                    143:     %%
                    144:     \\n      ++num_lines; ++num_chars;
                    145:     .       ++num_chars;
                    146:
                    147:     %%
                    148:     main()
                    149:             {
                    150:             yylex();
                    151:             printf( "# of lines = %d, # of chars = %d\\n",
                    152:                     num_lines, num_chars );
                    153:             }
                    154:
                    155: .fi
                    156: This scanner counts the number of characters and the number
                    157: of lines in its input (it produces no output other than the
                    158: final report on the counts).  The first line
                    159: declares two globals, "num_lines" and "num_chars", which are accessible
                    160: both inside
                    161: .B yylex()
                    162: and in the
                    163: .B main()
                    164: routine declared after the second "%%".  There are two rules, one
                    165: which matches a newline ("\\n") and increments both the line count and
                    166: the character count, and one which matches any character other than
                    167: a newline (indicated by the "." regular expression).
                    168: .PP
                    169: A somewhat more complicated example:
                    170: .nf
                    171:
                    172:     /* scanner for a toy Pascal-like language */
                    173:
                    174:     %{
                    175:     /* need this for the call to atof() below */
                    176:     #include <math.h>
                    177:     %}
                    178:
                    179:     DIGIT    [0-9]
                    180:     ID       [a-z][a-z0-9]*
                    181:
                    182:     %%
                    183:
                    184:     {DIGIT}+    {
                    185:                 printf( "An integer: %s (%d)\\n", yytext,
                    186:                         atoi( yytext ) );
                    187:                 }
                    188:
                    189:     {DIGIT}+"."{DIGIT}*        {
                    190:                 printf( "A float: %s (%g)\\n", yytext,
                    191:                         atof( yytext ) );
                    192:                 }
                    193:
                    194:     if|then|begin|end|procedure|function        {
                    195:                 printf( "A keyword: %s\\n", yytext );
                    196:                 }
                    197:
                    198:     {ID}        printf( "An identifier: %s\\n", yytext );
                    199:
                    200:     "+"|"-"|"*"|"/"   printf( "An operator: %s\\n", yytext );
                    201:
                    202:     "{"[^}\\n]*"}"     /* eat up one-line comments */
                    203:
                    204:     [ \\t\\n]+          /* eat up whitespace */
                    205:
                    206:     .           printf( "Unrecognized character: %s\\n", yytext );
                    207:
                    208:     %%
                    209:
                    210:     main( argc, argv )
                    211:     int argc;
                    212:     char **argv;
                    213:         {
                    214:         ++argv, --argc;  /* skip over program name */
                    215:         if ( argc > 0 )
                    216:                 yyin = fopen( argv[0], "r" );
                    217:         else
                    218:                 yyin = stdin;
                    219:
                    220:         yylex();
                    221:         }
                    222:
                    223: .fi
                    224: This is the beginnings of a simple scanner for a language like
                    225: Pascal.  It identifies different types of
                    226: .I tokens
                    227: and reports on what it has seen.
                    228: .PP
                    229: The details of this example will be explained in the following
                    230: sections.
                    231: .SH FORMAT OF THE INPUT FILE
                    232: The
                    233: .I flex
                    234: input file consists of three sections, separated by a line with just
                    235: .B %%
                    236: in it:
                    237: .nf
                    238:
                    239:     definitions
                    240:     %%
                    241:     rules
                    242:     %%
                    243:     user code
                    244:
                    245: .fi
                    246: The
                    247: .I definitions
                    248: section contains declarations of simple
                    249: .I name
                    250: definitions to simplify the scanner specification, and declarations of
                    251: .I start conditions,
                    252: which are explained in a later section.
                    253: .PP
                    254: Name definitions have the form:
                    255: .nf
                    256:
                    257:     name definition
                    258:
                    259: .fi
                    260: The "name" is a word beginning with a letter or an underscore ('_')
                    261: followed by zero or more letters, digits, '_', or '-' (dash).
                    262: The definition is taken to begin at the first non-white-space character
                    263: following the name and continuing to the end of the line.
                    264: The definition can subsequently be referred to using "{name}", which
                    265: will expand to "(definition)".  For example,
                    266: .nf
                    267:
                    268:     DIGIT    [0-9]
                    269:     ID       [a-z][a-z0-9]*
                    270:
                    271: .fi
                    272: defines "DIGIT" to be a regular expression which matches a
                    273: single digit, and
                    274: "ID" to be a regular expression which matches a letter
                    275: followed by zero-or-more letters-or-digits.
                    276: A subsequent reference to
                    277: .nf
                    278:
                    279:     {DIGIT}+"."{DIGIT}*
                    280:
                    281: .fi
                    282: is identical to
                    283: .nf
                    284:
                    285:     ([0-9])+"."([0-9])*
                    286:
                    287: .fi
                    288: and matches one-or-more digits followed by a '.' followed
                    289: by zero-or-more digits.
                    290: .PP
                    291: The
                    292: .I rules
                    293: section of the
                    294: .I flex
                    295: input contains a series of rules of the form:
                    296: .nf
                    297:
                    298:     pattern   action
                    299:
                    300: .fi
                    301: where the pattern must be unindented and the action must begin
                    302: on the same line.
                    303: .PP
                    304: See below for a further description of patterns and actions.
                    305: .PP
                    306: Finally, the user code section is simply copied to
                    307: .B lex.yy.c
                    308: verbatim.
                    309: It is used for companion routines which call or are called
                    310: by the scanner.  The presence of this section is optional;
                    311: if it is missing, the second
                    312: .B %%
                    313: in the input file may be skipped, too.
                    314: .PP
                    315: In the definitions and rules sections, any
                    316: .I indented
                    317: text or text enclosed in
                    318: .B %{
                    319: and
                    320: .B %}
                    321: is copied verbatim to the output (with the %{}'s removed).
                    322: The %{}'s must appear unindented on lines by themselves.
                    323: .PP
                    324: In the rules section,
                    325: any indented or %{} text appearing before the
                    326: first rule may be used to declare variables
                    327: which are local to the scanning routine and (after the declarations)
                    328: code which is to be executed whenever the scanning routine is entered.
                    329: Other indented or %{} text in the rule section is still copied to the output,
                    330: but its meaning is not well-defined and it may well cause compile-time
                    331: errors (this feature is present for
                    332: .I POSIX
                    333: compliance; see below for other such features).
                    334: .PP
                    335: In the definitions section (but not in the rules section),
                    336: an unindented comment (i.e., a line
                    337: beginning with "/*") is also copied verbatim to the output up
                    338: to the next "*/".
                    339: .SH PATTERNS
                    340: The patterns in the input are written using an extended set of regular
                    341: expressions.  These are:
                    342: .nf
                    343:
                    344:     x          match the character 'x'
                    345:     .          any character (byte) except newline
                    346:     [xyz]      a "character class"; in this case, the pattern
                    347:                  matches either an 'x', a 'y', or a 'z'
                    348:     [abj-oZ]   a "character class" with a range in it; matches
                    349:                  an 'a', a 'b', any letter from 'j' through 'o',
                    350:                  or a 'Z'
                    351:     [^A-Z]     a "negated character class", i.e., any character
                    352:                  but those in the class.  In this case, any
                    353:                  character EXCEPT an uppercase letter.
                    354:     [^A-Z\\n]   any character EXCEPT an uppercase letter or
                    355:                  a newline
                    356:     r*         zero or more r's, where r is any regular expression
                    357:     r+         one or more r's
                    358:     r?         zero or one r's (that is, "an optional r")
                    359:     r{2,5}     anywhere from two to five r's
                    360:     r{2,}      two or more r's
                    361:     r{4}       exactly 4 r's
                    362:     {name}     the expansion of the "name" definition
                    363:                (see above)
                    364:     "[xyz]\\"foo"
                    365:                the literal string: [xyz]"foo
                    366:     \\X         if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
                    367:                  then the ANSI-C interpretation of \\x.
                    368:                  Otherwise, a literal 'X' (used to escape
                    369:                  operators such as '*')
                    370:     \\0         a NUL character (ASCII code 0)
                    371:     \\123       the character with octal value 123
                    372:     \\x2a       the character with hexadecimal value 2a
                    373:     (r)        match an r; parentheses are used to override
                    374:                  precedence (see below)
                    375:
                    376:
                    377:     rs         the regular expression r followed by the
                    378:                  regular expression s; called "concatenation"
                    379:
                    380:
                    381:     r|s        either an r or an s
                    382:
                    383:
                    384:     r/s        an r but only if it is followed by an s.  The
                    385:                  text matched by s is included when determining
                    386:                  whether this rule is the "longest match",
                    387:                  but is then returned to the input before
                    388:                  the action is executed.  So the action only
                    389:                  sees the text matched by r.  This type
                    390:                  of pattern is called trailing context".
                    391:                  (There are some combinations of r/s that flex
                    392:                  cannot match correctly; see notes in the
                    393:                  Deficiencies / Bugs section below regarding
                    394:                  "dangerous trailing context".)
                    395:     ^r         an r, but only at the beginning of a line (i.e.,
                    396:                  which just starting to scan, or right after a
                    397:                  newline has been scanned).
                    398:     r$         an r, but only at the end of a line (i.e., just
                    399:                  before a newline).  Equivalent to "r/\\n".
                    400:
                    401:                Note that flex's notion of "newline" is exactly
                    402:                whatever the C compiler used to compile flex
                    403:                interprets '\\n' as; in particular, on some DOS
                    404:                systems you must either filter out \\r's in the
                    405:                input yourself, or explicitly use r/\\r\\n for "r$".
                    406:
                    407:
                    408:     <s>r       an r, but only in start condition s (see
                    409:                  below for discussion of start conditions)
                    410:     <s1,s2,s3>r
                    411:                same, but in any of start conditions s1,
                    412:                  s2, or s3
                    413:     <*>r       an r in any start condition, even an exclusive one.
                    414:
                    415:
                    416:     <<EOF>>    an end-of-file
                    417:     <s1,s2><<EOF>>
                    418:                an end-of-file when in start condition s1 or s2
                    419:
                    420: .fi
                    421: Note that inside of a character class, all regular expression operators
                    422: lose their special meaning except escape ('\\') and the character class
                    423: operators, '-', ']', and, at the beginning of the class, '^'.
                    424: .PP
                    425: The regular expressions listed above are grouped according to
                    426: precedence, from highest precedence at the top to lowest at the bottom.
                    427: Those grouped together have equal precedence.  For example,
                    428: .nf
                    429:
                    430:     foo|bar*
                    431:
                    432: .fi
                    433: is the same as
                    434: .nf
                    435:
                    436:     (foo)|(ba(r*))
                    437:
                    438: .fi
                    439: since the '*' operator has higher precedence than concatenation,
                    440: and concatenation higher than alternation ('|').  This pattern
                    441: therefore matches
                    442: .I either
                    443: the string "foo"
                    444: .I or
                    445: the string "ba" followed by zero-or-more r's.
                    446: To match "foo" or zero-or-more "bar"'s, use:
                    447: .nf
                    448:
                    449:     foo|(bar)*
                    450:
                    451: .fi
                    452: and to match zero-or-more "foo"'s-or-"bar"'s:
                    453: .nf
                    454:
                    455:     (foo|bar)*
                    456:
                    457: .fi
                    458: .PP
                    459: In addition to characters and ranges of characters, character classes
                    460: can also contain character class
                    461: .I expressions.
                    462: These are expressions enclosed inside
                    463: .B [:
                    464: and
                    465: .B :]
                    466: delimiters (which themselves must appear between the '[' and ']' of the
                    467: character class; other elements may occur inside the character class, too).
                    468: The valid expressions are:
                    469: .nf
                    470:
                    471:     [:alnum:] [:alpha:] [:blank:]
                    472:     [:cntrl:] [:digit:] [:graph:]
                    473:     [:lower:] [:print:] [:punct:]
                    474:     [:space:] [:upper:] [:xdigit:]
                    475:
                    476: .fi
                    477: These expressions all designate a set of characters equivalent to
                    478: the corresponding standard C
                    479: .B isXXX
                    480: function.  For example,
                    481: .B [:alnum:]
                    482: designates those characters for which
                    483: .B isalnum()
                    484: returns true - i.e., any alphabetic or numeric.
                    485: Some systems don't provide
                    486: .B isblank(),
                    487: so flex defines
                    488: .B [:blank:]
                    489: as a blank or a tab.
                    490: .PP
                    491: For example, the following character classes are all equivalent:
                    492: .nf
                    493:
                    494:     [[:alnum:]]
1.4       deraadt   495:     [[:alpha:][:digit:]]
1.1       deraadt   496:     [[:alpha:]0-9]
                    497:     [a-zA-Z0-9]
                    498:
                    499: .fi
                    500: If your scanner is case-insensitive (the
                    501: .B \-i
                    502: flag), then
                    503: .B [:upper:]
                    504: and
                    505: .B [:lower:]
                    506: are equivalent to
                    507: .B [:alpha:].
                    508: .PP
                    509: Some notes on patterns:
                    510: .IP -
                    511: A negated character class such as the example "[^A-Z]"
                    512: above
                    513: .I will match a newline
                    514: unless "\\n" (or an equivalent escape sequence) is one of the
                    515: characters explicitly present in the negated character class
                    516: (e.g., "[^A-Z\\n]").  This is unlike how many other regular
                    517: expression tools treat negated character classes, but unfortunately
                    518: the inconsistency is historically entrenched.
                    519: Matching newlines means that a pattern like [^"]* can match the entire
                    520: input unless there's another quote in the input.
                    521: .IP -
                    522: A rule can have at most one instance of trailing context (the '/' operator
                    523: or the '$' operator).  The start condition, '^', and "<<EOF>>" patterns
                    524: can only occur at the beginning of a pattern, and, as well as with '/' and '$',
                    525: cannot be grouped inside parentheses.  A '^' which does not occur at
                    526: the beginning of a rule or a '$' which does not occur at the end of
                    527: a rule loses its special properties and is treated as a normal character.
                    528: .IP
                    529: The following are illegal:
                    530: .nf
                    531:
                    532:     foo/bar$
                    533:     <sc1>foo<sc2>bar
                    534:
                    535: .fi
                    536: Note that the first of these, can be written "foo/bar\\n".
                    537: .IP
                    538: The following will result in '$' or '^' being treated as a normal character:
                    539: .nf
                    540:
                    541:     foo|(bar$)
                    542:     foo|^bar
                    543:
                    544: .fi
                    545: If what's wanted is a "foo" or a bar-followed-by-a-newline, the following
                    546: could be used (the special '|' action is explained below):
                    547: .nf
                    548:
                    549:     foo      |
                    550:     bar$     /* action goes here */
                    551:
                    552: .fi
                    553: A similar trick will work for matching a foo or a
                    554: bar-at-the-beginning-of-a-line.
                    555: .SH HOW THE INPUT IS MATCHED
                    556: When the generated scanner is run, it analyzes its input looking
                    557: for strings which match any of its patterns.  If it finds more than
                    558: one match, it takes the one matching the most text (for trailing
                    559: context rules, this includes the length of the trailing part, even
                    560: though it will then be returned to the input).  If it finds two
                    561: or more matches of the same length, the
                    562: rule listed first in the
                    563: .I flex
                    564: input file is chosen.
                    565: .PP
                    566: Once the match is determined, the text corresponding to the match
                    567: (called the
                    568: .I token)
                    569: is made available in the global character pointer
                    570: .B yytext,
                    571: and its length in the global integer
                    572: .B yyleng.
                    573: The
                    574: .I action
                    575: corresponding to the matched pattern is then executed (a more
                    576: detailed description of actions follows), and then the remaining
                    577: input is scanned for another match.
                    578: .PP
                    579: If no match is found, then the
                    580: .I default rule
                    581: is executed: the next character in the input is considered matched and
                    582: copied to the standard output.  Thus, the simplest legal
                    583: .I flex
                    584: input is:
                    585: .nf
                    586:
                    587:     %%
                    588:
                    589: .fi
                    590: which generates a scanner that simply copies its input (one character
                    591: at a time) to its output.
                    592: .PP
                    593: Note that
                    594: .B yytext
                    595: can be defined in two different ways: either as a character
                    596: .I pointer
                    597: or as a character
                    598: .I array.
                    599: You can control which definition
                    600: .I flex
                    601: uses by including one of the special directives
                    602: .B %pointer
                    603: or
                    604: .B %array
                    605: in the first (definitions) section of your flex input.  The default is
                    606: .B %pointer,
                    607: unless you use the
                    608: .B -l
                    609: lex compatibility option, in which case
                    610: .B yytext
                    611: will be an array.
                    612: The advantage of using
                    613: .B %pointer
                    614: is substantially faster scanning and no buffer overflow when matching
                    615: very large tokens (unless you run out of dynamic memory).  The disadvantage
                    616: is that you are restricted in how your actions can modify
                    617: .B yytext
                    618: (see the next section), and calls to the
                    619: .B unput()
                    620: function destroys the present contents of
                    621: .B yytext,
                    622: which can be a considerable porting headache when moving between different
                    623: .I lex
                    624: versions.
                    625: .PP
                    626: The advantage of
                    627: .B %array
                    628: is that you can then modify
                    629: .B yytext
                    630: to your heart's content, and calls to
                    631: .B unput()
                    632: do not destroy
                    633: .B yytext
                    634: (see below).  Furthermore, existing
                    635: .I lex
                    636: programs sometimes access
                    637: .B yytext
                    638: externally using declarations of the form:
                    639: .nf
                    640:     extern char yytext[];
                    641: .fi
                    642: This definition is erroneous when used with
                    643: .B %pointer,
                    644: but correct for
                    645: .B %array.
                    646: .PP
                    647: .B %array
                    648: defines
                    649: .B yytext
                    650: to be an array of
                    651: .B YYLMAX
                    652: characters, which defaults to a fairly large value.  You can change
                    653: the size by simply #define'ing
                    654: .B YYLMAX
                    655: to a different value in the first section of your
                    656: .I flex
                    657: input.  As mentioned above, with
                    658: .B %pointer
                    659: yytext grows dynamically to accommodate large tokens.  While this means your
                    660: .B %pointer
                    661: scanner can accommodate very large tokens (such as matching entire blocks
                    662: of comments), bear in mind that each time the scanner must resize
                    663: .B yytext
                    664: it also must rescan the entire token from the beginning, so matching such
                    665: tokens can prove slow.
                    666: .B yytext
                    667: presently does
                    668: .I not
                    669: dynamically grow if a call to
                    670: .B unput()
                    671: results in too much text being pushed back; instead, a run-time error results.
                    672: .PP
                    673: Also note that you cannot use
                    674: .B %array
                    675: with C++ scanner classes
                    676: (the
                    677: .B c++
                    678: option; see below).
                    679: .SH ACTIONS
                    680: Each pattern in a rule has a corresponding action, which can be any
                    681: arbitrary C statement.  The pattern ends at the first non-escaped
                    682: whitespace character; the remainder of the line is its action.  If the
                    683: action is empty, then when the pattern is matched the input token
                    684: is simply discarded.  For example, here is the specification for a program
                    685: which deletes all occurrences of "zap me" from its input:
                    686: .nf
                    687:
                    688:     %%
                    689:     "zap me"
                    690:
                    691: .fi
                    692: (It will copy all other characters in the input to the output since
                    693: they will be matched by the default rule.)
                    694: .PP
                    695: Here is a program which compresses multiple blanks and tabs down to
                    696: a single blank, and throws away whitespace found at the end of a line:
                    697: .nf
                    698:
                    699:     %%
                    700:     [ \\t]+        putchar( ' ' );
                    701:     [ \\t]+$       /* ignore this token */
                    702:
                    703: .fi
                    704: .PP
                    705: If the action contains a '{', then the action spans till the balancing '}'
                    706: is found, and the action may cross multiple lines.
                    707: .I flex
                    708: knows about C strings and comments and won't be fooled by braces found
                    709: within them, but also allows actions to begin with
                    710: .B %{
                    711: and will consider the action to be all the text up to the next
                    712: .B %}
                    713: (regardless of ordinary braces inside the action).
                    714: .PP
                    715: An action consisting solely of a vertical bar ('|') means "same as
                    716: the action for the next rule."  See below for an illustration.
                    717: .PP
                    718: Actions can include arbitrary C code, including
                    719: .B return
                    720: statements to return a value to whatever routine called
                    721: .B yylex().
                    722: Each time
                    723: .B yylex()
                    724: is called it continues processing tokens from where it last left
                    725: off until it either reaches
                    726: the end of the file or executes a return.
                    727: .PP
                    728: Actions are free to modify
                    729: .B yytext
                    730: except for lengthening it (adding
                    731: characters to its end--these will overwrite later characters in the
                    732: input stream).  This however does not apply when using
                    733: .B %array
                    734: (see above); in that case,
                    735: .B yytext
                    736: may be freely modified in any way.
                    737: .PP
                    738: Actions are free to modify
                    739: .B yyleng
                    740: except they should not do so if the action also includes use of
                    741: .B yymore()
                    742: (see below).
                    743: .PP
                    744: There are a number of special directives which can be included within
                    745: an action:
                    746: .IP -
                    747: .B ECHO
                    748: copies yytext to the scanner's output.
                    749: .IP -
                    750: .B BEGIN
                    751: followed by the name of a start condition places the scanner in the
                    752: corresponding start condition (see below).
                    753: .IP -
                    754: .B REJECT
                    755: directs the scanner to proceed on to the "second best" rule which matched the
                    756: input (or a prefix of the input).  The rule is chosen as described
                    757: above in "How the Input is Matched", and
                    758: .B yytext
                    759: and
                    760: .B yyleng
                    761: set up appropriately.
                    762: It may either be one which matched as much text
                    763: as the originally chosen rule but came later in the
                    764: .I flex
                    765: input file, or one which matched less text.
                    766: For example, the following will both count the
                    767: words in the input and call the routine special() whenever "frob" is seen:
                    768: .nf
                    769:
                    770:             int word_count = 0;
                    771:     %%
                    772:
                    773:     frob        special(); REJECT;
                    774:     [^ \\t\\n]+   ++word_count;
                    775:
                    776: .fi
                    777: Without the
                    778: .B REJECT,
                    779: any "frob"'s in the input would not be counted as words, since the
                    780: scanner normally executes only one action per token.
                    781: Multiple
                    782: .B REJECT's
                    783: are allowed, each one finding the next best choice to the currently
                    784: active rule.  For example, when the following scanner scans the token
                    785: "abcd", it will write "abcdabcaba" to the output:
                    786: .nf
                    787:
                    788:     %%
                    789:     a        |
                    790:     ab       |
                    791:     abc      |
                    792:     abcd     ECHO; REJECT;
                    793:     .|\\n     /* eat up any unmatched character */
                    794:
                    795: .fi
                    796: (The first three rules share the fourth's action since they use
                    797: the special '|' action.)
                    798: .B REJECT
                    799: is a particularly expensive feature in terms of scanner performance;
                    800: if it is used in
                    801: .I any
                    802: of the scanner's actions it will slow down
                    803: .I all
                    804: of the scanner's matching.  Furthermore,
                    805: .B REJECT
                    806: cannot be used with the
                    807: .I -Cf
                    808: or
                    809: .I -CF
                    810: options (see below).
                    811: .IP
                    812: Note also that unlike the other special actions,
                    813: .B REJECT
                    814: is a
                    815: .I branch;
                    816: code immediately following it in the action will
                    817: .I not
                    818: be executed.
                    819: .IP -
                    820: .B yymore()
                    821: tells the scanner that the next time it matches a rule, the corresponding
                    822: token should be
                    823: .I appended
                    824: onto the current value of
                    825: .B yytext
                    826: rather than replacing it.  For example, given the input "mega-kludge"
                    827: the following will write "mega-mega-kludge" to the output:
                    828: .nf
                    829:
                    830:     %%
                    831:     mega-    ECHO; yymore();
                    832:     kludge   ECHO;
                    833:
                    834: .fi
                    835: First "mega-" is matched and echoed to the output.  Then "kludge"
                    836: is matched, but the previous "mega-" is still hanging around at the
                    837: beginning of
                    838: .B yytext
                    839: so the
                    840: .B ECHO
                    841: for the "kludge" rule will actually write "mega-kludge".
                    842: .PP
                    843: Two notes regarding use of
                    844: .B yymore().
                    845: First,
                    846: .B yymore()
                    847: depends on the value of
                    848: .I yyleng
                    849: correctly reflecting the size of the current token, so you must not
                    850: modify
                    851: .I yyleng
                    852: if you are using
                    853: .B yymore().
                    854: Second, the presence of
                    855: .B yymore()
                    856: in the scanner's action entails a minor performance penalty in the
                    857: scanner's matching speed.
                    858: .IP -
                    859: .B yyless(n)
                    860: returns all but the first
                    861: .I n
                    862: characters of the current token back to the input stream, where they
                    863: will be rescanned when the scanner looks for the next match.
                    864: .B yytext
                    865: and
                    866: .B yyleng
                    867: are adjusted appropriately (e.g.,
                    868: .B yyleng
                    869: will now be equal to
                    870: .I n
                    871: ).  For example, on the input "foobar" the following will write out
                    872: "foobarbar":
                    873: .nf
                    874:
                    875:     %%
                    876:     foobar    ECHO; yyless(3);
                    877:     [a-z]+    ECHO;
                    878:
                    879: .fi
                    880: An argument of 0 to
                    881: .B yyless
                    882: will cause the entire current input string to be scanned again.  Unless you've
                    883: changed how the scanner will subsequently process its input (using
                    884: .B BEGIN,
                    885: for example), this will result in an endless loop.
                    886: .PP
                    887: Note that
                    888: .B yyless
                    889: is a macro and can only be used in the flex input file, not from
                    890: other source files.
                    891: .IP -
                    892: .B unput(c)
                    893: puts the character
                    894: .I c
                    895: back onto the input stream.  It will be the next character scanned.
                    896: The following action will take the current token and cause it
                    897: to be rescanned enclosed in parentheses.
                    898: .nf
                    899:
                    900:     {
                    901:     int i;
                    902:     /* Copy yytext because unput() trashes yytext */
                    903:     char *yycopy = strdup( yytext );
                    904:     unput( ')' );
                    905:     for ( i = yyleng - 1; i >= 0; --i )
                    906:         unput( yycopy[i] );
                    907:     unput( '(' );
                    908:     free( yycopy );
                    909:     }
                    910:
                    911: .fi
                    912: Note that since each
                    913: .B unput()
                    914: puts the given character back at the
                    915: .I beginning
                    916: of the input stream, pushing back strings must be done back-to-front.
                    917: .PP
                    918: An important potential problem when using
                    919: .B unput()
                    920: is that if you are using
                    921: .B %pointer
                    922: (the default), a call to
                    923: .B unput()
                    924: .I destroys
                    925: the contents of
                    926: .I yytext,
                    927: starting with its rightmost character and devouring one character to
                    928: the left with each call.  If you need the value of yytext preserved
                    929: after a call to
                    930: .B unput()
                    931: (as in the above example),
                    932: you must either first copy it elsewhere, or build your scanner using
                    933: .B %array
                    934: instead (see How The Input Is Matched).
                    935: .PP
                    936: Finally, note that you cannot put back
                    937: .B EOF
                    938: to attempt to mark the input stream with an end-of-file.
                    939: .IP -
                    940: .B input()
                    941: reads the next character from the input stream.  For example,
                    942: the following is one way to eat up C comments:
                    943: .nf
                    944:
                    945:     %%
                    946:     "/*"        {
                    947:                 register int c;
                    948:
                    949:                 for ( ; ; )
                    950:                     {
                    951:                     while ( (c = input()) != '*' &&
                    952:                             c != EOF )
                    953:                         ;    /* eat up text of comment */
                    954:
                    955:                     if ( c == '*' )
                    956:                         {
                    957:                         while ( (c = input()) == '*' )
                    958:                             ;
                    959:                         if ( c == '/' )
                    960:                             break;    /* found the end */
                    961:                         }
                    962:
                    963:                     if ( c == EOF )
                    964:                         {
                    965:                         error( "EOF in comment" );
                    966:                         break;
                    967:                         }
                    968:                     }
                    969:                 }
                    970:
                    971: .fi
                    972: (Note that if the scanner is compiled using
                    973: .B C++,
                    974: then
                    975: .B input()
                    976: is instead referred to as
                    977: .B yyinput(),
                    978: in order to avoid a name clash with the
                    979: .B C++
                    980: stream by the name of
                    981: .I input.)
                    982: .IP -
                    983: .B YY_FLUSH_BUFFER
                    984: flushes the scanner's internal buffer
                    985: so that the next time the scanner attempts to match a token, it will
                    986: first refill the buffer using
                    987: .B YY_INPUT
                    988: (see The Generated Scanner, below).  This action is a special case
                    989: of the more general
                    990: .B yy_flush_buffer()
                    991: function, described below in the section Multiple Input Buffers.
                    992: .IP -
                    993: .B yyterminate()
                    994: can be used in lieu of a return statement in an action.  It terminates
                    995: the scanner and returns a 0 to the scanner's caller, indicating "all done".
                    996: By default,
                    997: .B yyterminate()
                    998: is also called when an end-of-file is encountered.  It is a macro and
                    999: may be redefined.
                   1000: .SH THE GENERATED SCANNER
                   1001: The output of
                   1002: .I flex
                   1003: is the file
                   1004: .B lex.yy.c,
                   1005: which contains the scanning routine
                   1006: .B yylex(),
                   1007: a number of tables used by it for matching tokens, and a number
                   1008: of auxiliary routines and macros.  By default,
                   1009: .B yylex()
                   1010: is declared as follows:
                   1011: .nf
                   1012:
                   1013:     int yylex()
                   1014:         {
                   1015:         ... various definitions and the actions in here ...
                   1016:         }
                   1017:
                   1018: .fi
                   1019: (If your environment supports function prototypes, then it will
                   1020: be "int yylex( void )".)  This definition may be changed by defining
                   1021: the "YY_DECL" macro.  For example, you could use:
                   1022: .nf
                   1023:
                   1024:     #define YY_DECL float lexscan( a, b ) float a, b;
                   1025:
                   1026: .fi
                   1027: to give the scanning routine the name
                   1028: .I lexscan,
                   1029: returning a float, and taking two floats as arguments.  Note that
                   1030: if you give arguments to the scanning routine using a
                   1031: K&R-style/non-prototyped function declaration, you must terminate
                   1032: the definition with a semi-colon (;).
                   1033: .PP
                   1034: Whenever
                   1035: .B yylex()
                   1036: is called, it scans tokens from the global input file
                   1037: .I yyin
                   1038: (which defaults to stdin).  It continues until it either reaches
                   1039: an end-of-file (at which point it returns the value 0) or
                   1040: one of its actions executes a
                   1041: .I return
                   1042: statement.
                   1043: .PP
                   1044: If the scanner reaches an end-of-file, subsequent calls are undefined
                   1045: unless either
                   1046: .I yyin
                   1047: is pointed at a new input file (in which case scanning continues from
                   1048: that file), or
                   1049: .B yyrestart()
                   1050: is called.
                   1051: .B yyrestart()
                   1052: takes one argument, a
                   1053: .B FILE *
                   1054: pointer (which can be nil, if you've set up
                   1055: .B YY_INPUT
                   1056: to scan from a source other than
                   1057: .I yyin),
                   1058: and initializes
                   1059: .I yyin
                   1060: for scanning from that file.  Essentially there is no difference between
                   1061: just assigning
                   1062: .I yyin
                   1063: to a new input file or using
                   1064: .B yyrestart()
                   1065: to do so; the latter is available for compatibility with previous versions
                   1066: of
                   1067: .I flex,
                   1068: and because it can be used to switch input files in the middle of scanning.
                   1069: It can also be used to throw away the current input buffer, by calling
                   1070: it with an argument of
                   1071: .I yyin;
                   1072: but better is to use
                   1073: .B YY_FLUSH_BUFFER
                   1074: (see above).
                   1075: Note that
                   1076: .B yyrestart()
                   1077: does
                   1078: .I not
                   1079: reset the start condition to
                   1080: .B INITIAL
                   1081: (see Start Conditions, below).
                   1082: .PP
                   1083: If
                   1084: .B yylex()
                   1085: stops scanning due to executing a
                   1086: .I return
                   1087: statement in one of the actions, the scanner may then be called again and it
                   1088: will resume scanning where it left off.
                   1089: .PP
                   1090: By default (and for purposes of efficiency), the scanner uses
                   1091: block-reads rather than simple
                   1092: .I getc()
                   1093: calls to read characters from
                   1094: .I yyin.
                   1095: The nature of how it gets its input can be controlled by defining the
                   1096: .B YY_INPUT
                   1097: macro.
                   1098: YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)".  Its
                   1099: action is to place up to
                   1100: .I max_size
                   1101: characters in the character array
                   1102: .I buf
                   1103: and return in the integer variable
                   1104: .I result
                   1105: either the
                   1106: number of characters read or the constant YY_NULL (0 on Unix systems)
                   1107: to indicate EOF.  The default YY_INPUT reads from the
                   1108: global file-pointer "yyin".
                   1109: .PP
                   1110: A sample definition of YY_INPUT (in the definitions
                   1111: section of the input file):
                   1112: .nf
                   1113:
                   1114:     %{
                   1115:     #define YY_INPUT(buf,result,max_size) \\
                   1116:         { \\
                   1117:         int c = getchar(); \\
                   1118:         result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\
                   1119:         }
                   1120:     %}
                   1121:
                   1122: .fi
                   1123: This definition will change the input processing to occur
                   1124: one character at a time.
                   1125: .PP
                   1126: When the scanner receives an end-of-file indication from YY_INPUT,
                   1127: it then checks the
                   1128: .B yywrap()
                   1129: function.  If
                   1130: .B yywrap()
                   1131: returns false (zero), then it is assumed that the
                   1132: function has gone ahead and set up
                   1133: .I yyin
                   1134: to point to another input file, and scanning continues.  If it returns
                   1135: true (non-zero), then the scanner terminates, returning 0 to its
                   1136: caller.  Note that in either case, the start condition remains unchanged;
                   1137: it does
                   1138: .I not
                   1139: revert to
                   1140: .B INITIAL.
                   1141: .PP
                   1142: If you do not supply your own version of
                   1143: .B yywrap(),
                   1144: then you must either use
                   1145: .B %option noyywrap
                   1146: (in which case the scanner behaves as though
                   1147: .B yywrap()
                   1148: returned 1), or you must link with
                   1149: .B \-lfl
                   1150: to obtain the default version of the routine, which always returns 1.
                   1151: .PP
                   1152: Three routines are available for scanning from in-memory buffers rather
                   1153: than files:
                   1154: .B yy_scan_string(), yy_scan_bytes(),
                   1155: and
                   1156: .B yy_scan_buffer().
                   1157: See the discussion of them below in the section Multiple Input Buffers.
                   1158: .PP
                   1159: The scanner writes its
                   1160: .B ECHO
                   1161: output to the
                   1162: .I yyout
                   1163: global (default, stdout), which may be redefined by the user simply
                   1164: by assigning it to some other
                   1165: .B FILE
                   1166: pointer.
                   1167: .SH START CONDITIONS
                   1168: .I flex
                   1169: provides a mechanism for conditionally activating rules.  Any rule
                   1170: whose pattern is prefixed with "<sc>" will only be active when
                   1171: the scanner is in the start condition named "sc".  For example,
                   1172: .nf
                   1173:
                   1174:     <STRING>[^"]*        { /* eat up the string body ... */
                   1175:                 ...
                   1176:                 }
                   1177:
                   1178: .fi
                   1179: will be active only when the scanner is in the "STRING" start
                   1180: condition, and
                   1181: .nf
                   1182:
                   1183:     <INITIAL,STRING,QUOTE>\\.        { /* handle an escape ... */
                   1184:                 ...
                   1185:                 }
                   1186:
                   1187: .fi
                   1188: will be active only when the current start condition is
                   1189: either "INITIAL", "STRING", or "QUOTE".
                   1190: .PP
                   1191: Start conditions
                   1192: are declared in the definitions (first) section of the input
                   1193: using unindented lines beginning with either
                   1194: .B %s
                   1195: or
                   1196: .B %x
                   1197: followed by a list of names.
                   1198: The former declares
                   1199: .I inclusive
                   1200: start conditions, the latter
                   1201: .I exclusive
                   1202: start conditions.  A start condition is activated using the
                   1203: .B BEGIN
                   1204: action.  Until the next
                   1205: .B BEGIN
                   1206: action is executed, rules with the given start
                   1207: condition will be active and
                   1208: rules with other start conditions will be inactive.
                   1209: If the start condition is
                   1210: .I inclusive,
                   1211: then rules with no start conditions at all will also be active.
                   1212: If it is
                   1213: .I exclusive,
                   1214: then
                   1215: .I only
                   1216: rules qualified with the start condition will be active.
                   1217: A set of rules contingent on the same exclusive start condition
                   1218: describe a scanner which is independent of any of the other rules in the
                   1219: .I flex
                   1220: input.  Because of this,
                   1221: exclusive start conditions make it easy to specify "mini-scanners"
                   1222: which scan portions of the input that are syntactically different
                   1223: from the rest (e.g., comments).
                   1224: .PP
                   1225: If the distinction between inclusive and exclusive start conditions
                   1226: is still a little vague, here's a simple example illustrating the
                   1227: connection between the two.  The set of rules:
                   1228: .nf
                   1229:
                   1230:     %s example
                   1231:     %%
                   1232:
                   1233:     <example>foo   do_something();
                   1234:
                   1235:     bar            something_else();
                   1236:
                   1237: .fi
                   1238: is equivalent to
                   1239: .nf
                   1240:
                   1241:     %x example
                   1242:     %%
                   1243:
                   1244:     <example>foo   do_something();
                   1245:
                   1246:     <INITIAL,example>bar    something_else();
                   1247:
                   1248: .fi
                   1249: Without the
                   1250: .B <INITIAL,example>
                   1251: qualifier, the
                   1252: .I bar
                   1253: pattern in the second example wouldn't be active (i.e., couldn't match)
                   1254: when in start condition
                   1255: .B example.
                   1256: If we just used
                   1257: .B <example>
                   1258: to qualify
                   1259: .I bar,
                   1260: though, then it would only be active in
                   1261: .B example
                   1262: and not in
                   1263: .B INITIAL,
                   1264: while in the first example it's active in both, because in the first
                   1265: example the
                   1266: .B example
                   1267: startion condition is an
                   1268: .I inclusive
                   1269: .B (%s)
                   1270: start condition.
                   1271: .PP
                   1272: Also note that the special start-condition specifier
                   1273: .B <*>
                   1274: matches every start condition.  Thus, the above example could also
                   1275: have been written;
                   1276: .nf
                   1277:
                   1278:     %x example
                   1279:     %%
                   1280:
                   1281:     <example>foo   do_something();
                   1282:
                   1283:     <*>bar    something_else();
                   1284:
                   1285: .fi
                   1286: .PP
                   1287: The default rule (to
                   1288: .B ECHO
                   1289: any unmatched character) remains active in start conditions.  It
                   1290: is equivalent to:
                   1291: .nf
                   1292:
                   1293:     <*>.|\\n     ECHO;
                   1294:
                   1295: .fi
                   1296: .PP
                   1297: .B BEGIN(0)
                   1298: returns to the original state where only the rules with
                   1299: no start conditions are active.  This state can also be
                   1300: referred to as the start-condition "INITIAL", so
                   1301: .B BEGIN(INITIAL)
                   1302: is equivalent to
                   1303: .B BEGIN(0).
                   1304: (The parentheses around the start condition name are not required but
                   1305: are considered good style.)
                   1306: .PP
                   1307: .B BEGIN
                   1308: actions can also be given as indented code at the beginning
                   1309: of the rules section.  For example, the following will cause
                   1310: the scanner to enter the "SPECIAL" start condition whenever
                   1311: .B yylex()
                   1312: is called and the global variable
                   1313: .I enter_special
                   1314: is true:
                   1315: .nf
                   1316:
                   1317:             int enter_special;
                   1318:
                   1319:     %x SPECIAL
                   1320:     %%
                   1321:             if ( enter_special )
                   1322:                 BEGIN(SPECIAL);
                   1323:
                   1324:     <SPECIAL>blahblahblah
                   1325:     ...more rules follow...
                   1326:
                   1327: .fi
                   1328: .PP
                   1329: To illustrate the uses of start conditions,
                   1330: here is a scanner which provides two different interpretations
                   1331: of a string like "123.456".  By default it will treat it as
                   1332: three tokens, the integer "123", a dot ('.'), and the integer "456".
                   1333: But if the string is preceded earlier in the line by the string
                   1334: "expect-floats"
                   1335: it will treat it as a single token, the floating-point number
                   1336: 123.456:
                   1337: .nf
                   1338:
                   1339:     %{
                   1340:     #include <math.h>
                   1341:     %}
                   1342:     %s expect
                   1343:
                   1344:     %%
                   1345:     expect-floats        BEGIN(expect);
                   1346:
                   1347:     <expect>[0-9]+"."[0-9]+      {
                   1348:                 printf( "found a float, = %f\\n",
                   1349:                         atof( yytext ) );
                   1350:                 }
                   1351:     <expect>\\n           {
                   1352:                 /* that's the end of the line, so
                   1353:                  * we need another "expect-number"
                   1354:                  * before we'll recognize any more
                   1355:                  * numbers
                   1356:                  */
                   1357:                 BEGIN(INITIAL);
                   1358:                 }
                   1359:
                   1360:     [0-9]+      {
                   1361:                 printf( "found an integer, = %d\\n",
                   1362:                         atoi( yytext ) );
                   1363:                 }
                   1364:
                   1365:     "."         printf( "found a dot\\n" );
                   1366:
                   1367: .fi
                   1368: Here is a scanner which recognizes (and discards) C comments while
                   1369: maintaining a count of the current input line.
                   1370: .nf
                   1371:
                   1372:     %x comment
                   1373:     %%
                   1374:             int line_num = 1;
                   1375:
                   1376:     "/*"         BEGIN(comment);
                   1377:
                   1378:     <comment>[^*\\n]*        /* eat anything that's not a '*' */
                   1379:     <comment>"*"+[^*/\\n]*   /* eat up '*'s not followed by '/'s */
                   1380:     <comment>\\n             ++line_num;
                   1381:     <comment>"*"+"/"        BEGIN(INITIAL);
                   1382:
                   1383: .fi
                   1384: This scanner goes to a bit of trouble to match as much
                   1385: text as possible with each rule.  In general, when attempting to write
                   1386: a high-speed scanner try to match as much possible in each rule, as
                   1387: it's a big win.
                   1388: .PP
                   1389: Note that start-conditions names are really integer values and
                   1390: can be stored as such.  Thus, the above could be extended in the
                   1391: following fashion:
                   1392: .nf
                   1393:
                   1394:     %x comment foo
                   1395:     %%
                   1396:             int line_num = 1;
                   1397:             int comment_caller;
                   1398:
                   1399:     "/*"         {
                   1400:                  comment_caller = INITIAL;
                   1401:                  BEGIN(comment);
                   1402:                  }
                   1403:
                   1404:     ...
                   1405:
                   1406:     <foo>"/*"    {
                   1407:                  comment_caller = foo;
                   1408:                  BEGIN(comment);
                   1409:                  }
                   1410:
                   1411:     <comment>[^*\\n]*        /* eat anything that's not a '*' */
                   1412:     <comment>"*"+[^*/\\n]*   /* eat up '*'s not followed by '/'s */
                   1413:     <comment>\\n             ++line_num;
                   1414:     <comment>"*"+"/"        BEGIN(comment_caller);
                   1415:
                   1416: .fi
                   1417: Furthermore, you can access the current start condition using
                   1418: the integer-valued
                   1419: .B YY_START
                   1420: macro.  For example, the above assignments to
                   1421: .I comment_caller
                   1422: could instead be written
                   1423: .nf
                   1424:
                   1425:     comment_caller = YY_START;
                   1426:
                   1427: .fi
                   1428: Flex provides
                   1429: .B YYSTATE
                   1430: as an alias for
                   1431: .B YY_START
                   1432: (since that is what's used by AT&T
                   1433: .I lex).
                   1434: .PP
                   1435: Note that start conditions do not have their own name-space; %s's and %x's
                   1436: declare names in the same fashion as #define's.
                   1437: .PP
                   1438: Finally, here's an example of how to match C-style quoted strings using
                   1439: exclusive start conditions, including expanded escape sequences (but
                   1440: not including checking for a string that's too long):
                   1441: .nf
                   1442:
                   1443:     %x str
                   1444:
                   1445:     %%
                   1446:             char string_buf[MAX_STR_CONST];
                   1447:             char *string_buf_ptr;
                   1448:
                   1449:
                   1450:     \\"      string_buf_ptr = string_buf; BEGIN(str);
                   1451:
                   1452:     <str>\\"        { /* saw closing quote - all done */
                   1453:             BEGIN(INITIAL);
                   1454:             *string_buf_ptr = '\\0';
                   1455:             /* return string constant token type and
                   1456:              * value to parser
                   1457:              */
                   1458:             }
                   1459:
                   1460:     <str>\\n        {
                   1461:             /* error - unterminated string constant */
                   1462:             /* generate error message */
                   1463:             }
                   1464:
                   1465:     <str>\\\\[0-7]{1,3} {
                   1466:             /* octal escape sequence */
                   1467:             int result;
                   1468:
                   1469:             (void) sscanf( yytext + 1, "%o", &result );
                   1470:
                   1471:             if ( result > 0xff )
                   1472:                     /* error, constant is out-of-bounds */
                   1473:
                   1474:             *string_buf_ptr++ = result;
                   1475:             }
                   1476:
                   1477:     <str>\\\\[0-9]+ {
                   1478:             /* generate error - bad escape sequence; something
                   1479:              * like '\\48' or '\\0777777'
                   1480:              */
                   1481:             }
                   1482:
                   1483:     <str>\\\\n  *string_buf_ptr++ = '\\n';
                   1484:     <str>\\\\t  *string_buf_ptr++ = '\\t';
                   1485:     <str>\\\\r  *string_buf_ptr++ = '\\r';
                   1486:     <str>\\\\b  *string_buf_ptr++ = '\\b';
                   1487:     <str>\\\\f  *string_buf_ptr++ = '\\f';
                   1488:
                   1489:     <str>\\\\(.|\\n)  *string_buf_ptr++ = yytext[1];
                   1490:
                   1491:     <str>[^\\\\\\n\\"]+        {
                   1492:             char *yptr = yytext;
                   1493:
                   1494:             while ( *yptr )
                   1495:                     *string_buf_ptr++ = *yptr++;
                   1496:             }
                   1497:
                   1498: .fi
                   1499: .PP
                   1500: Often, such as in some of the examples above, you wind up writing a
                   1501: whole bunch of rules all preceded by the same start condition(s).  Flex
                   1502: makes this a little easier and cleaner by introducing a notion of
                   1503: start condition
                   1504: .I scope.
                   1505: A start condition scope is begun with:
                   1506: .nf
                   1507:
                   1508:     <SCs>{
                   1509:
                   1510: .fi
                   1511: where
                   1512: .I SCs
                   1513: is a list of one or more start conditions.  Inside the start condition
                   1514: scope, every rule automatically has the prefix
                   1515: .I <SCs>
                   1516: applied to it, until a
                   1517: .I '}'
                   1518: which matches the initial
                   1519: .I '{'.
                   1520: So, for example,
                   1521: .nf
                   1522:
                   1523:     <ESC>{
                   1524:         "\\\\n"   return '\\n';
                   1525:         "\\\\r"   return '\\r';
                   1526:         "\\\\f"   return '\\f';
                   1527:         "\\\\0"   return '\\0';
                   1528:     }
                   1529:
                   1530: .fi
                   1531: is equivalent to:
                   1532: .nf
                   1533:
                   1534:     <ESC>"\\\\n"  return '\\n';
                   1535:     <ESC>"\\\\r"  return '\\r';
                   1536:     <ESC>"\\\\f"  return '\\f';
                   1537:     <ESC>"\\\\0"  return '\\0';
                   1538:
                   1539: .fi
                   1540: Start condition scopes may be nested.
                   1541: .PP
                   1542: Three routines are available for manipulating stacks of start conditions:
                   1543: .TP
                   1544: .B void yy_push_state(int new_state)
                   1545: pushes the current start condition onto the top of the start condition
                   1546: stack and switches to
                   1547: .I new_state
                   1548: as though you had used
                   1549: .B BEGIN new_state
                   1550: (recall that start condition names are also integers).
                   1551: .TP
                   1552: .B void yy_pop_state()
                   1553: pops the top of the stack and switches to it via
                   1554: .B BEGIN.
                   1555: .TP
                   1556: .B int yy_top_state()
                   1557: returns the top of the stack without altering the stack's contents.
                   1558: .PP
                   1559: The start condition stack grows dynamically and so has no built-in
                   1560: size limitation.  If memory is exhausted, program execution aborts.
                   1561: .PP
                   1562: To use start condition stacks, your scanner must include a
                   1563: .B %option stack
                   1564: directive (see Options below).
                   1565: .SH MULTIPLE INPUT BUFFERS
                   1566: Some scanners (such as those which support "include" files)
                   1567: require reading from several input streams.  As
                   1568: .I flex
                   1569: scanners do a large amount of buffering, one cannot control
                   1570: where the next input will be read from by simply writing a
                   1571: .B YY_INPUT
                   1572: which is sensitive to the scanning context.
                   1573: .B YY_INPUT
                   1574: is only called when the scanner reaches the end of its buffer, which
                   1575: may be a long time after scanning a statement such as an "include"
                   1576: which requires switching the input source.
                   1577: .PP
                   1578: To negotiate these sorts of problems,
                   1579: .I flex
                   1580: provides a mechanism for creating and switching between multiple
                   1581: input buffers.  An input buffer is created by using:
                   1582: .nf
                   1583:
                   1584:     YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
                   1585:
                   1586: .fi
                   1587: which takes a
                   1588: .I FILE
                   1589: pointer and a size and creates a buffer associated with the given
                   1590: file and large enough to hold
                   1591: .I size
                   1592: characters (when in doubt, use
                   1593: .B YY_BUF_SIZE
                   1594: for the size).  It returns a
                   1595: .B YY_BUFFER_STATE
                   1596: handle, which may then be passed to other routines (see below).  The
                   1597: .B YY_BUFFER_STATE
                   1598: type is a pointer to an opaque
                   1599: .B struct yy_buffer_state
                   1600: structure, so you may safely initialize YY_BUFFER_STATE variables to
                   1601: .B ((YY_BUFFER_STATE) 0)
                   1602: if you wish, and also refer to the opaque structure in order to
                   1603: correctly declare input buffers in source files other than that
                   1604: of your scanner.  Note that the
                   1605: .I FILE
                   1606: pointer in the call to
                   1607: .B yy_create_buffer
                   1608: is only used as the value of
                   1609: .I yyin
                   1610: seen by
                   1611: .B YY_INPUT;
                   1612: if you redefine
                   1613: .B YY_INPUT
                   1614: so it no longer uses
                   1615: .I yyin,
                   1616: then you can safely pass a nil
                   1617: .I FILE
                   1618: pointer to
                   1619: .B yy_create_buffer.
                   1620: You select a particular buffer to scan from using:
                   1621: .nf
                   1622:
                   1623:     void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
                   1624:
                   1625: .fi
                   1626: switches the scanner's input buffer so subsequent tokens will
                   1627: come from
                   1628: .I new_buffer.
                   1629: Note that
                   1630: .B yy_switch_to_buffer()
                   1631: may be used by yywrap() to set things up for continued scanning, instead
                   1632: of opening a new file and pointing
                   1633: .I yyin
                   1634: at it.  Note also that switching input sources via either
                   1635: .B yy_switch_to_buffer()
                   1636: or
                   1637: .B yywrap()
                   1638: does
                   1639: .I not
                   1640: change the start condition.
                   1641: .nf
                   1642:
                   1643:     void yy_delete_buffer( YY_BUFFER_STATE buffer )
                   1644:
                   1645: .fi
                   1646: is used to reclaim the storage associated with a buffer.  (
                   1647: .B buffer
                   1648: can be nil, in which case the routine does nothing.)
                   1649: You can also clear the current contents of a buffer using:
                   1650: .nf
                   1651:
                   1652:     void yy_flush_buffer( YY_BUFFER_STATE buffer )
                   1653:
                   1654: .fi
                   1655: This function discards the buffer's contents,
                   1656: so the next time the scanner attempts to match a token from the
                   1657: buffer, it will first fill the buffer anew using
                   1658: .B YY_INPUT.
                   1659: .PP
                   1660: .B yy_new_buffer()
                   1661: is an alias for
                   1662: .B yy_create_buffer(),
                   1663: provided for compatibility with the C++ use of
                   1664: .I new
                   1665: and
                   1666: .I delete
                   1667: for creating and destroying dynamic objects.
                   1668: .PP
                   1669: Finally, the
                   1670: .B YY_CURRENT_BUFFER
                   1671: macro returns a
                   1672: .B YY_BUFFER_STATE
                   1673: handle to the current buffer.
                   1674: .PP
                   1675: Here is an example of using these features for writing a scanner
                   1676: which expands include files (the
                   1677: .B <<EOF>>
                   1678: feature is discussed below):
                   1679: .nf
                   1680:
                   1681:     /* the "incl" state is used for picking up the name
                   1682:      * of an include file
                   1683:      */
                   1684:     %x incl
                   1685:
                   1686:     %{
                   1687:     #define MAX_INCLUDE_DEPTH 10
                   1688:     YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
                   1689:     int include_stack_ptr = 0;
                   1690:     %}
                   1691:
                   1692:     %%
                   1693:     include             BEGIN(incl);
                   1694:
                   1695:     [a-z]+              ECHO;
                   1696:     [^a-z\\n]*\\n?        ECHO;
                   1697:
                   1698:     <incl>[ \\t]*      /* eat the whitespace */
                   1699:     <incl>[^ \\t\\n]+   { /* got the include file name */
                   1700:             if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
                   1701:                 {
                   1702:                 fprintf( stderr, "Includes nested too deeply" );
                   1703:                 exit( 1 );
                   1704:                 }
                   1705:
                   1706:             include_stack[include_stack_ptr++] =
                   1707:                 YY_CURRENT_BUFFER;
                   1708:
                   1709:             yyin = fopen( yytext, "r" );
                   1710:
                   1711:             if ( ! yyin )
                   1712:                 error( ... );
                   1713:
                   1714:             yy_switch_to_buffer(
                   1715:                 yy_create_buffer( yyin, YY_BUF_SIZE ) );
                   1716:
                   1717:             BEGIN(INITIAL);
                   1718:             }
                   1719:
                   1720:     <<EOF>> {
                   1721:             if ( --include_stack_ptr < 0 )
                   1722:                 {
                   1723:                 yyterminate();
                   1724:                 }
                   1725:
                   1726:             else
                   1727:                 {
                   1728:                 yy_delete_buffer( YY_CURRENT_BUFFER );
                   1729:                 yy_switch_to_buffer(
                   1730:                      include_stack[include_stack_ptr] );
                   1731:                 }
                   1732:             }
                   1733:
                   1734: .fi
                   1735: Three routines are available for setting up input buffers for
                   1736: scanning in-memory strings instead of files.  All of them create
                   1737: a new input buffer for scanning the string, and return a corresponding
                   1738: .B YY_BUFFER_STATE
                   1739: handle (which you should delete with
                   1740: .B yy_delete_buffer()
                   1741: when done with it).  They also switch to the new buffer using
                   1742: .B yy_switch_to_buffer(),
                   1743: so the next call to
                   1744: .B yylex()
                   1745: will start scanning the string.
                   1746: .TP
                   1747: .B yy_scan_string(const char *str)
                   1748: scans a NUL-terminated string.
                   1749: .TP
                   1750: .B yy_scan_bytes(const char *bytes, int len)
                   1751: scans
                   1752: .I len
                   1753: bytes (including possibly NUL's)
                   1754: starting at location
                   1755: .I bytes.
                   1756: .PP
                   1757: Note that both of these functions create and scan a
                   1758: .I copy
                   1759: of the string or bytes.  (This may be desirable, since
                   1760: .B yylex()
                   1761: modifies the contents of the buffer it is scanning.)  You can avoid the
                   1762: copy by using:
                   1763: .TP
                   1764: .B yy_scan_buffer(char *base, yy_size_t size)
                   1765: which scans in place the buffer starting at
                   1766: .I base,
                   1767: consisting of
                   1768: .I size
                   1769: bytes, the last two bytes of which
                   1770: .I must
                   1771: be
                   1772: .B YY_END_OF_BUFFER_CHAR
                   1773: (ASCII NUL).
                   1774: These last two bytes are not scanned; thus, scanning
                   1775: consists of
                   1776: .B base[0]
                   1777: through
                   1778: .B base[size-2],
                   1779: inclusive.
                   1780: .IP
                   1781: If you fail to set up
                   1782: .I base
                   1783: in this manner (i.e., forget the final two
                   1784: .B YY_END_OF_BUFFER_CHAR
                   1785: bytes), then
                   1786: .B yy_scan_buffer()
                   1787: returns a nil pointer instead of creating a new input buffer.
                   1788: .IP
                   1789: The type
                   1790: .B yy_size_t
                   1791: is an integral type to which you can cast an integer expression
                   1792: reflecting the size of the buffer.
                   1793: .SH END-OF-FILE RULES
                   1794: The special rule "<<EOF>>" indicates
                   1795: actions which are to be taken when an end-of-file is
                   1796: encountered and yywrap() returns non-zero (i.e., indicates
                   1797: no further files to process).  The action must finish
                   1798: by doing one of four things:
                   1799: .IP -
                   1800: assigning
                   1801: .I yyin
                   1802: to a new input file (in previous versions of flex, after doing the
                   1803: assignment you had to call the special action
                   1804: .B YY_NEW_FILE;
                   1805: this is no longer necessary);
                   1806: .IP -
                   1807: executing a
                   1808: .I return
                   1809: statement;
                   1810: .IP -
                   1811: executing the special
                   1812: .B yyterminate()
                   1813: action;
                   1814: .IP -
                   1815: or, switching to a new buffer using
                   1816: .B yy_switch_to_buffer()
                   1817: as shown in the example above.
                   1818: .PP
                   1819: <<EOF>> rules may not be used with other
                   1820: patterns; they may only be qualified with a list of start
                   1821: conditions.  If an unqualified <<EOF>> rule is given, it
                   1822: applies to
                   1823: .I all
                   1824: start conditions which do not already have <<EOF>> actions.  To
                   1825: specify an <<EOF>> rule for only the initial start condition, use
                   1826: .nf
                   1827:
                   1828:     <INITIAL><<EOF>>
                   1829:
                   1830: .fi
                   1831: .PP
                   1832: These rules are useful for catching things like unclosed comments.
                   1833: An example:
                   1834: .nf
                   1835:
                   1836:     %x quote
                   1837:     %%
                   1838:
                   1839:     ...other rules for dealing with quotes...
                   1840:
                   1841:     <quote><<EOF>>   {
                   1842:              error( "unterminated quote" );
                   1843:              yyterminate();
                   1844:              }
                   1845:     <<EOF>>  {
                   1846:              if ( *++filelist )
                   1847:                  yyin = fopen( *filelist, "r" );
                   1848:              else
                   1849:                 yyterminate();
                   1850:              }
                   1851:
                   1852: .fi
                   1853: .SH MISCELLANEOUS MACROS
                   1854: The macro
                   1855: .B YY_USER_ACTION
                   1856: can be defined to provide an action
                   1857: which is always executed prior to the matched rule's action.  For example,
                   1858: it could be #define'd to call a routine to convert yytext to lower-case.
                   1859: When
                   1860: .B YY_USER_ACTION
                   1861: is invoked, the variable
                   1862: .I yy_act
                   1863: gives the number of the matched rule (rules are numbered starting with 1).
                   1864: Suppose you want to profile how often each of your rules is matched.  The
                   1865: following would do the trick:
                   1866: .nf
                   1867:
                   1868:     #define YY_USER_ACTION ++ctr[yy_act]
                   1869:
                   1870: .fi
                   1871: where
                   1872: .I ctr
                   1873: is an array to hold the counts for the different rules.  Note that
                   1874: the macro
                   1875: .B YY_NUM_RULES
                   1876: gives the total number of rules (including the default rule, even if
                   1877: you use
                   1878: .B \-s),
                   1879: so a correct declaration for
                   1880: .I ctr
                   1881: is:
                   1882: .nf
                   1883:
                   1884:     int ctr[YY_NUM_RULES];
                   1885:
                   1886: .fi
                   1887: .PP
                   1888: The macro
                   1889: .B YY_USER_INIT
                   1890: may be defined to provide an action which is always executed before
                   1891: the first scan (and before the scanner's internal initializations are done).
                   1892: For example, it could be used to call a routine to read
                   1893: in a data table or open a logging file.
                   1894: .PP
                   1895: The macro
                   1896: .B yy_set_interactive(is_interactive)
                   1897: can be used to control whether the current buffer is considered
                   1898: .I interactive.
                   1899: An interactive buffer is processed more slowly,
                   1900: but must be used when the scanner's input source is indeed
                   1901: interactive to avoid problems due to waiting to fill buffers
                   1902: (see the discussion of the
                   1903: .B \-I
                   1904: flag below).  A non-zero value
                   1905: in the macro invocation marks the buffer as interactive, a zero
                   1906: value as non-interactive.  Note that use of this macro overrides
                   1907: .B %option always-interactive
                   1908: or
                   1909: .B %option never-interactive
                   1910: (see Options below).
                   1911: .B yy_set_interactive()
                   1912: must be invoked prior to beginning to scan the buffer that is
                   1913: (or is not) to be considered interactive.
                   1914: .PP
                   1915: The macro
                   1916: .B yy_set_bol(at_bol)
                   1917: can be used to control whether the current buffer's scanning
                   1918: context for the next token match is done as though at the
                   1919: beginning of a line.  A non-zero macro argument makes rules anchored with
                   1920: '^' active, while a zero argument makes '^' rules inactive.
                   1921: .PP
                   1922: The macro
                   1923: .B YY_AT_BOL()
                   1924: returns true if the next token scanned from the current buffer
                   1925: will have '^' rules active, false otherwise.
                   1926: .PP
                   1927: In the generated scanner, the actions are all gathered in one large
                   1928: switch statement and separated using
                   1929: .B YY_BREAK,
                   1930: which may be redefined.  By default, it is simply a "break", to separate
                   1931: each rule's action from the following rule's.
                   1932: Redefining
                   1933: .B YY_BREAK
                   1934: allows, for example, C++ users to
                   1935: #define YY_BREAK to do nothing (while being very careful that every
                   1936: rule ends with a "break" or a "return"!) to avoid suffering from
                   1937: unreachable statement warnings where because a rule's action ends with
                   1938: "return", the
                   1939: .B YY_BREAK
                   1940: is inaccessible.
                   1941: .SH VALUES AVAILABLE TO THE USER
                   1942: This section summarizes the various values available to the user
                   1943: in the rule actions.
                   1944: .IP -
                   1945: .B char *yytext
                   1946: holds the text of the current token.  It may be modified but not lengthened
                   1947: (you cannot append characters to the end).
                   1948: .IP
                   1949: If the special directive
                   1950: .B %array
                   1951: appears in the first section of the scanner description, then
                   1952: .B yytext
                   1953: is instead declared
                   1954: .B char yytext[YYLMAX],
                   1955: where
                   1956: .B YYLMAX
                   1957: is a macro definition that you can redefine in the first section
                   1958: if you don't like the default value (generally 8KB).  Using
                   1959: .B %array
                   1960: results in somewhat slower scanners, but the value of
                   1961: .B yytext
                   1962: becomes immune to calls to
                   1963: .I input()
                   1964: and
                   1965: .I unput(),
                   1966: which potentially destroy its value when
                   1967: .B yytext
                   1968: is a character pointer.  The opposite of
                   1969: .B %array
                   1970: is
                   1971: .B %pointer,
                   1972: which is the default.
                   1973: .IP
                   1974: You cannot use
                   1975: .B %array
                   1976: when generating C++ scanner classes
                   1977: (the
                   1978: .B \-+
                   1979: flag).
                   1980: .IP -
                   1981: .B int yyleng
                   1982: holds the length of the current token.
                   1983: .IP -
                   1984: .B FILE *yyin
                   1985: is the file which by default
                   1986: .I flex
                   1987: reads from.  It may be redefined but doing so only makes sense before
                   1988: scanning begins or after an EOF has been encountered.  Changing it in
                   1989: the midst of scanning will have unexpected results since
                   1990: .I flex
                   1991: buffers its input; use
                   1992: .B yyrestart()
                   1993: instead.
                   1994: Once scanning terminates because an end-of-file
                   1995: has been seen, you can assign
                   1996: .I yyin
                   1997: at the new input file and then call the scanner again to continue scanning.
                   1998: .IP -
                   1999: .B void yyrestart( FILE *new_file )
                   2000: may be called to point
                   2001: .I yyin
                   2002: at the new input file.  The switch-over to the new file is immediate
                   2003: (any previously buffered-up input is lost).  Note that calling
                   2004: .B yyrestart()
                   2005: with
                   2006: .I yyin
                   2007: as an argument thus throws away the current input buffer and continues
                   2008: scanning the same input file.
                   2009: .IP -
                   2010: .B FILE *yyout
                   2011: is the file to which
                   2012: .B ECHO
                   2013: actions are done.  It can be reassigned by the user.
                   2014: .IP -
                   2015: .B YY_CURRENT_BUFFER
                   2016: returns a
                   2017: .B YY_BUFFER_STATE
                   2018: handle to the current buffer.
                   2019: .IP -
                   2020: .B YY_START
                   2021: returns an integer value corresponding to the current start
                   2022: condition.  You can subsequently use this value with
                   2023: .B BEGIN
                   2024: to return to that start condition.
                   2025: .SH INTERFACING WITH YACC
                   2026: One of the main uses of
                   2027: .I flex
                   2028: is as a companion to the
                   2029: .I yacc
                   2030: parser-generator.
                   2031: .I yacc
                   2032: parsers expect to call a routine named
                   2033: .B yylex()
                   2034: to find the next input token.  The routine is supposed to
                   2035: return the type of the next token as well as putting any associated
                   2036: value in the global
                   2037: .B yylval.
                   2038: To use
                   2039: .I flex
                   2040: with
                   2041: .I yacc,
                   2042: one specifies the
                   2043: .B \-d
                   2044: option to
                   2045: .I yacc
                   2046: to instruct it to generate the file
                   2047: .B y.tab.h
                   2048: containing definitions of all the
                   2049: .B %tokens
                   2050: appearing in the
                   2051: .I yacc
                   2052: input.  This file is then included in the
                   2053: .I flex
                   2054: scanner.  For example, if one of the tokens is "TOK_NUMBER",
                   2055: part of the scanner might look like:
                   2056: .nf
                   2057:
                   2058:     %{
                   2059:     #include "y.tab.h"
                   2060:     %}
                   2061:
                   2062:     %%
                   2063:
                   2064:     [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
                   2065:
                   2066: .fi
                   2067: .SH OPTIONS
                   2068: .I flex
                   2069: has the following options:
                   2070: .TP
                   2071: .B \-b
                   2072: Generate backing-up information to
                   2073: .I lex.backup.
                   2074: This is a list of scanner states which require backing up
                   2075: and the input characters on which they do so.  By adding rules one
                   2076: can remove backing-up states.  If
                   2077: .I all
                   2078: backing-up states are eliminated and
                   2079: .B \-Cf
                   2080: or
                   2081: .B \-CF
                   2082: is used, the generated scanner will run faster (see the
                   2083: .B \-p
                   2084: flag).  Only users who wish to squeeze every last cycle out of their
                   2085: scanners need worry about this option.  (See the section on Performance
                   2086: Considerations below.)
                   2087: .TP
                   2088: .B \-c
                   2089: is a do-nothing, deprecated option included for POSIX compliance.
                   2090: .TP
                   2091: .B \-d
                   2092: makes the generated scanner run in
                   2093: .I debug
                   2094: mode.  Whenever a pattern is recognized and the global
                   2095: .B yy_flex_debug
                   2096: is non-zero (which is the default),
                   2097: the scanner will write to
                   2098: .I stderr
                   2099: a line of the form:
                   2100: .nf
                   2101:
                   2102:     --accepting rule at line 53 ("the matched text")
                   2103:
                   2104: .fi
                   2105: The line number refers to the location of the rule in the file
                   2106: defining the scanner (i.e., the file that was fed to flex).  Messages
                   2107: are also generated when the scanner backs up, accepts the
                   2108: default rule, reaches the end of its input buffer (or encounters
                   2109: a NUL; at this point, the two look the same as far as the scanner's concerned),
                   2110: or reaches an end-of-file.
                   2111: .TP
                   2112: .B \-f
                   2113: specifies
                   2114: .I fast scanner.
                   2115: No table compression is done and stdio is bypassed.
                   2116: The result is large but fast.  This option is equivalent to
                   2117: .B \-Cfr
                   2118: (see below).
                   2119: .TP
                   2120: .B \-h
                   2121: generates a "help" summary of
                   2122: .I flex's
                   2123: options to
                   2124: .I stdout
                   2125: and then exits.
                   2126: .B \-?
                   2127: and
                   2128: .B \-\-help
                   2129: are synonyms for
                   2130: .B \-h.
                   2131: .TP
                   2132: .B \-i
                   2133: instructs
                   2134: .I flex
                   2135: to generate a
                   2136: .I case-insensitive
                   2137: scanner.  The case of letters given in the
                   2138: .I flex
                   2139: input patterns will
                   2140: be ignored, and tokens in the input will be matched regardless of case.  The
                   2141: matched text given in
                   2142: .I yytext
                   2143: will have the preserved case (i.e., it will not be folded).
                   2144: .TP
                   2145: .B \-l
                   2146: turns on maximum compatibility with the original AT&T
                   2147: .I lex
                   2148: implementation.  Note that this does not mean
                   2149: .I full
                   2150: compatibility.  Use of this option costs a considerable amount of
                   2151: performance, and it cannot be used with the
                   2152: .B \-+, -f, -F, -Cf,
                   2153: or
                   2154: .B -CF
                   2155: options.  For details on the compatibilities it provides, see the section
                   2156: "Incompatibilities With Lex And POSIX" below.  This option also results
                   2157: in the name
                   2158: .B YY_FLEX_LEX_COMPAT
                   2159: being #define'd in the generated scanner.
                   2160: .TP
                   2161: .B \-n
                   2162: is another do-nothing, deprecated option included only for
                   2163: POSIX compliance.
                   2164: .TP
                   2165: .B \-p
                   2166: generates a performance report to stderr.  The report
                   2167: consists of comments regarding features of the
                   2168: .I flex
                   2169: input file which will cause a serious loss of performance in the resulting
                   2170: scanner.  If you give the flag twice, you will also get comments regarding
                   2171: features that lead to minor performance losses.
                   2172: .IP
                   2173: Note that the use of
                   2174: .B REJECT,
                   2175: .B %option yylineno,
                   2176: and variable trailing context (see the Deficiencies / Bugs section below)
                   2177: entails a substantial performance penalty; use of
                   2178: .I yymore(),
                   2179: the
                   2180: .B ^
                   2181: operator,
                   2182: and the
                   2183: .B \-I
                   2184: flag entail minor performance penalties.
                   2185: .TP
                   2186: .B \-s
                   2187: causes the
                   2188: .I default rule
                   2189: (that unmatched scanner input is echoed to
                   2190: .I stdout)
                   2191: to be suppressed.  If the scanner encounters input that does not
                   2192: match any of its rules, it aborts with an error.  This option is
                   2193: useful for finding holes in a scanner's rule set.
                   2194: .TP
                   2195: .B \-t
                   2196: instructs
                   2197: .I flex
                   2198: to write the scanner it generates to standard output instead
                   2199: of
                   2200: .B lex.yy.c.
                   2201: .TP
                   2202: .B \-v
                   2203: specifies that
                   2204: .I flex
                   2205: should write to
                   2206: .I stderr
                   2207: a summary of statistics regarding the scanner it generates.
                   2208: Most of the statistics are meaningless to the casual
                   2209: .I flex
                   2210: user, but the first line identifies the version of
                   2211: .I flex
                   2212: (same as reported by
                   2213: .B \-V),
                   2214: and the next line the flags used when generating the scanner, including
                   2215: those that are on by default.
                   2216: .TP
                   2217: .B \-w
                   2218: suppresses warning messages.
                   2219: .TP
                   2220: .B \-B
                   2221: instructs
                   2222: .I flex
                   2223: to generate a
                   2224: .I batch
                   2225: scanner, the opposite of
                   2226: .I interactive
                   2227: scanners generated by
                   2228: .B \-I
                   2229: (see below).  In general, you use
                   2230: .B \-B
                   2231: when you are
                   2232: .I certain
                   2233: that your scanner will never be used interactively, and you want to
                   2234: squeeze a
                   2235: .I little
                   2236: more performance out of it.  If your goal is instead to squeeze out a
                   2237: .I lot
                   2238: more performance, you should  be using the
                   2239: .B \-Cf
                   2240: or
                   2241: .B \-CF
                   2242: options (discussed below), which turn on
                   2243: .B \-B
                   2244: automatically anyway.
                   2245: .TP
                   2246: .B \-F
                   2247: specifies that the
                   2248: .ul
                   2249: fast
                   2250: scanner table representation should be used (and stdio
                   2251: bypassed).  This representation is
                   2252: about as fast as the full table representation
                   2253: .B (-f),
                   2254: and for some sets of patterns will be considerably smaller (and for
                   2255: others, larger).  In general, if the pattern set contains both "keywords"
                   2256: and a catch-all, "identifier" rule, such as in the set:
                   2257: .nf
                   2258:
                   2259:     "case"    return TOK_CASE;
                   2260:     "switch"  return TOK_SWITCH;
                   2261:     ...
                   2262:     "default" return TOK_DEFAULT;
                   2263:     [a-z]+    return TOK_ID;
                   2264:
                   2265: .fi
                   2266: then you're better off using the full table representation.  If only
                   2267: the "identifier" rule is present and you then use a hash table or some such
                   2268: to detect the keywords, you're better off using
                   2269: .B -F.
                   2270: .IP
                   2271: This option is equivalent to
                   2272: .B \-CFr
                   2273: (see below).  It cannot be used with
                   2274: .B \-+.
                   2275: .TP
                   2276: .B \-I
                   2277: instructs
                   2278: .I flex
                   2279: to generate an
                   2280: .I interactive
                   2281: scanner.  An interactive scanner is one that only looks ahead to decide
                   2282: what token has been matched if it absolutely must.  It turns out that
                   2283: always looking one extra character ahead, even if the scanner has already
                   2284: seen enough text to disambiguate the current token, is a bit faster than
                   2285: only looking ahead when necessary.  But scanners that always look ahead
                   2286: give dreadful interactive performance; for example, when a user types
                   2287: a newline, it is not recognized as a newline token until they enter
                   2288: .I another
                   2289: token, which often means typing in another whole line.
                   2290: .IP
                   2291: .I Flex
                   2292: scanners default to
                   2293: .I interactive
                   2294: unless you use the
                   2295: .B \-Cf
                   2296: or
                   2297: .B \-CF
                   2298: table-compression options (see below).  That's because if you're looking
                   2299: for high-performance you should be using one of these options, so if you
                   2300: didn't,
                   2301: .I flex
                   2302: assumes you'd rather trade off a bit of run-time performance for intuitive
                   2303: interactive behavior.  Note also that you
                   2304: .I cannot
                   2305: use
                   2306: .B \-I
                   2307: in conjunction with
                   2308: .B \-Cf
                   2309: or
                   2310: .B \-CF.
                   2311: Thus, this option is not really needed; it is on by default for all those
                   2312: cases in which it is allowed.
                   2313: .IP
                   2314: You can force a scanner to
                   2315: .I not
                   2316: be interactive by using
                   2317: .B \-B
                   2318: (see above).
                   2319: .TP
                   2320: .B \-L
                   2321: instructs
                   2322: .I flex
                   2323: not to generate
                   2324: .B #line
                   2325: directives.  Without this option,
                   2326: .I flex
                   2327: peppers the generated scanner
                   2328: with #line directives so error messages in the actions will be correctly
                   2329: located with respect to either the original
                   2330: .I flex
                   2331: input file (if the errors are due to code in the input file), or
                   2332: .B lex.yy.c
                   2333: (if the errors are
                   2334: .I flex's
                   2335: fault -- you should report these sorts of errors to the email address
                   2336: given below).
                   2337: .TP
                   2338: .B \-T
                   2339: makes
                   2340: .I flex
                   2341: run in
                   2342: .I trace
                   2343: mode.  It will generate a lot of messages to
                   2344: .I stderr
                   2345: concerning
                   2346: the form of the input and the resultant non-deterministic and deterministic
                   2347: finite automata.  This option is mostly for use in maintaining
                   2348: .I flex.
                   2349: .TP
                   2350: .B \-V
                   2351: prints the version number to
                   2352: .I stdout
                   2353: and exits.
                   2354: .B \-\-version
                   2355: is a synonym for
                   2356: .B \-V.
                   2357: .TP
                   2358: .B \-7
                   2359: instructs
                   2360: .I flex
                   2361: to generate a 7-bit scanner, i.e., one which can only recognized 7-bit
                   2362: characters in its input.  The advantage of using
                   2363: .B \-7
                   2364: is that the scanner's tables can be up to half the size of those generated
                   2365: using the
                   2366: .B \-8
                   2367: option (see below).  The disadvantage is that such scanners often hang
                   2368: or crash if their input contains an 8-bit character.
                   2369: .IP
                   2370: Note, however, that unless you generate your scanner using the
                   2371: .B \-Cf
                   2372: or
                   2373: .B \-CF
                   2374: table compression options, use of
                   2375: .B \-7
                   2376: will save only a small amount of table space, and make your scanner
                   2377: considerably less portable.
                   2378: .I Flex's
                   2379: default behavior is to generate an 8-bit scanner unless you use the
                   2380: .B \-Cf
                   2381: or
                   2382: .B \-CF,
                   2383: in which case
                   2384: .I flex
                   2385: defaults to generating 7-bit scanners unless your site was always
                   2386: configured to generate 8-bit scanners (as will often be the case
                   2387: with non-USA sites).  You can tell whether flex generated a 7-bit
                   2388: or an 8-bit scanner by inspecting the flag summary in the
                   2389: .B \-v
                   2390: output as described above.
                   2391: .IP
                   2392: Note that if you use
                   2393: .B \-Cfe
                   2394: or
                   2395: .B \-CFe
                   2396: (those table compression options, but also using equivalence classes as
                   2397: discussed see below), flex still defaults to generating an 8-bit
                   2398: scanner, since usually with these compression options full 8-bit tables
                   2399: are not much more expensive than 7-bit tables.
                   2400: .TP
                   2401: .B \-8
                   2402: instructs
                   2403: .I flex
                   2404: to generate an 8-bit scanner, i.e., one which can recognize 8-bit
                   2405: characters.  This flag is only needed for scanners generated using
                   2406: .B \-Cf
                   2407: or
                   2408: .B \-CF,
                   2409: as otherwise flex defaults to generating an 8-bit scanner anyway.
                   2410: .IP
                   2411: See the discussion of
                   2412: .B \-7
                   2413: above for flex's default behavior and the tradeoffs between 7-bit
                   2414: and 8-bit scanners.
                   2415: .TP
                   2416: .B \-+
                   2417: specifies that you want flex to generate a C++
                   2418: scanner class.  See the section on Generating C++ Scanners below for
                   2419: details.
                   2420: .TP
                   2421: .B \-C[aefFmr]
                   2422: controls the degree of table compression and, more generally, trade-offs
                   2423: between small scanners and fast scanners.
                   2424: .IP
                   2425: .B \-Ca
                   2426: ("align") instructs flex to trade off larger tables in the
                   2427: generated scanner for faster performance because the elements of
                   2428: the tables are better aligned for memory access and computation.  On some
                   2429: RISC architectures, fetching and manipulating longwords is more efficient
                   2430: than with smaller-sized units such as shortwords.  This option can
                   2431: double the size of the tables used by your scanner.
                   2432: .IP
                   2433: .B \-Ce
                   2434: directs
                   2435: .I flex
                   2436: to construct
                   2437: .I equivalence classes,
                   2438: i.e., sets of characters
                   2439: which have identical lexical properties (for example, if the only
                   2440: appearance of digits in the
                   2441: .I flex
                   2442: input is in the character class
                   2443: "[0-9]" then the digits '0', '1', ..., '9' will all be put
                   2444: in the same equivalence class).  Equivalence classes usually give
                   2445: dramatic reductions in the final table/object file sizes (typically
                   2446: a factor of 2-5) and are pretty cheap performance-wise (one array
                   2447: look-up per character scanned).
                   2448: .IP
                   2449: .B \-Cf
                   2450: specifies that the
                   2451: .I full
                   2452: scanner tables should be generated -
                   2453: .I flex
                   2454: should not compress the
                   2455: tables by taking advantages of similar transition functions for
                   2456: different states.
                   2457: .IP
                   2458: .B \-CF
                   2459: specifies that the alternate fast scanner representation (described
                   2460: above under the
                   2461: .B \-F
                   2462: flag)
                   2463: should be used.  This option cannot be used with
                   2464: .B \-+.
                   2465: .IP
                   2466: .B \-Cm
                   2467: directs
                   2468: .I flex
                   2469: to construct
                   2470: .I meta-equivalence classes,
                   2471: which are sets of equivalence classes (or characters, if equivalence
                   2472: classes are not being used) that are commonly used together.  Meta-equivalence
                   2473: classes are often a big win when using compressed tables, but they
                   2474: have a moderate performance impact (one or two "if" tests and one
                   2475: array look-up per character scanned).
                   2476: .IP
                   2477: .B \-Cr
                   2478: causes the generated scanner to
                   2479: .I bypass
                   2480: use of the standard I/O library (stdio) for input.  Instead of calling
                   2481: .B fread()
                   2482: or
                   2483: .B getc(),
                   2484: the scanner will use the
                   2485: .B read()
                   2486: system call, resulting in a performance gain which varies from system
                   2487: to system, but in general is probably negligible unless you are also using
                   2488: .B \-Cf
                   2489: or
                   2490: .B \-CF.
                   2491: Using
                   2492: .B \-Cr
                   2493: can cause strange behavior if, for example, you read from
                   2494: .I yyin
                   2495: using stdio prior to calling the scanner (because the scanner will miss
                   2496: whatever text your previous reads left in the stdio input buffer).
                   2497: .IP
                   2498: .B \-Cr
                   2499: has no effect if you define
                   2500: .B YY_INPUT
                   2501: (see The Generated Scanner above).
                   2502: .IP
                   2503: A lone
                   2504: .B \-C
                   2505: specifies that the scanner tables should be compressed but neither
                   2506: equivalence classes nor meta-equivalence classes should be used.
                   2507: .IP
                   2508: The options
                   2509: .B \-Cf
                   2510: or
                   2511: .B \-CF
                   2512: and
                   2513: .B \-Cm
                   2514: do not make sense together - there is no opportunity for meta-equivalence
                   2515: classes if the table is not being compressed.  Otherwise the options
                   2516: may be freely mixed, and are cumulative.
                   2517: .IP
                   2518: The default setting is
                   2519: .B \-Cem,
                   2520: which specifies that
                   2521: .I flex
                   2522: should generate equivalence classes
                   2523: and meta-equivalence classes.  This setting provides the highest
                   2524: degree of table compression.  You can trade off
                   2525: faster-executing scanners at the cost of larger tables with
                   2526: the following generally being true:
                   2527: .nf
                   2528:
                   2529:     slowest & smallest
                   2530:           -Cem
                   2531:           -Cm
                   2532:           -Ce
                   2533:           -C
                   2534:           -C{f,F}e
                   2535:           -C{f,F}
                   2536:           -C{f,F}a
                   2537:     fastest & largest
                   2538:
                   2539: .fi
                   2540: Note that scanners with the smallest tables are usually generated and
                   2541: compiled the quickest, so
                   2542: during development you will usually want to use the default, maximal
                   2543: compression.
                   2544: .IP
                   2545: .B \-Cfe
                   2546: is often a good compromise between speed and size for production
                   2547: scanners.
                   2548: .TP
                   2549: .B \-ooutput
                   2550: directs flex to write the scanner to the file
                   2551: .B output
                   2552: instead of
                   2553: .B lex.yy.c.
                   2554: If you combine
                   2555: .B \-o
                   2556: with the
                   2557: .B \-t
                   2558: option, then the scanner is written to
                   2559: .I stdout
                   2560: but its
                   2561: .B #line
                   2562: directives (see the
                   2563: .B \\-L
                   2564: option above) refer to the file
                   2565: .B output.
                   2566: .TP
                   2567: .B \-Pprefix
                   2568: changes the default
                   2569: .I "yy"
                   2570: prefix used by
                   2571: .I flex
1.6     ! aaron    2572: for all globally visible variable and function names to instead be
1.1       deraadt  2573: .I prefix.
                   2574: For example,
                   2575: .B \-Pfoo
                   2576: changes the name of
                   2577: .B yytext
                   2578: to
                   2579: .B footext.
                   2580: It also changes the name of the default output file from
                   2581: .B lex.yy.c
                   2582: to
                   2583: .B lex.foo.c.
                   2584: Here are all of the names affected:
                   2585: .nf
                   2586:
                   2587:     yy_create_buffer
                   2588:     yy_delete_buffer
                   2589:     yy_flex_debug
                   2590:     yy_init_buffer
                   2591:     yy_flush_buffer
                   2592:     yy_load_buffer_state
                   2593:     yy_switch_to_buffer
                   2594:     yyin
                   2595:     yyleng
                   2596:     yylex
                   2597:     yylineno
                   2598:     yyout
                   2599:     yyrestart
                   2600:     yytext
                   2601:     yywrap
                   2602:
                   2603: .fi
                   2604: (If you are using a C++ scanner, then only
                   2605: .B yywrap
                   2606: and
                   2607: .B yyFlexLexer
                   2608: are affected.)
                   2609: Within your scanner itself, you can still refer to the global variables
                   2610: and functions using either version of their name; but externally, they
                   2611: have the modified name.
                   2612: .IP
                   2613: This option lets you easily link together multiple
                   2614: .I flex
                   2615: programs into the same executable.  Note, though, that using this
                   2616: option also renames
                   2617: .B yywrap(),
                   2618: so you now
                   2619: .I must
                   2620: either
1.6     ! aaron    2621: provide your own (appropriately named) version of the routine for your
1.1       deraadt  2622: scanner, or use
                   2623: .B %option noyywrap,
                   2624: as linking with
                   2625: .B \-lfl
                   2626: no longer provides one for you by default.
                   2627: .TP
                   2628: .B \-Sskeleton_file
                   2629: overrides the default skeleton file from which
                   2630: .I flex
                   2631: constructs its scanners.  You'll never need this option unless you are doing
                   2632: .I flex
                   2633: maintenance or development.
                   2634: .PP
                   2635: .I flex
                   2636: also provides a mechanism for controlling options within the
                   2637: scanner specification itself, rather than from the flex command-line.
                   2638: This is done by including
                   2639: .B %option
                   2640: directives in the first section of the scanner specification.
                   2641: You can specify multiple options with a single
                   2642: .B %option
                   2643: directive, and multiple directives in the first section of your flex input
                   2644: file.
                   2645: .PP
                   2646: Most options are given simply as names, optionally preceded by the
                   2647: word "no" (with no intervening whitespace) to negate their meaning.
                   2648: A number are equivalent to flex flags or their negation:
                   2649: .nf
                   2650:
                   2651:     7bit            -7 option
                   2652:     8bit            -8 option
                   2653:     align           -Ca option
                   2654:     backup          -b option
                   2655:     batch           -B option
                   2656:     c++             -+ option
                   2657:
                   2658:     caseful or
                   2659:     case-sensitive  opposite of -i (default)
                   2660:
                   2661:     case-insensitive or
                   2662:     caseless        -i option
                   2663:
                   2664:     debug           -d option
                   2665:     default         opposite of -s option
                   2666:     ecs             -Ce option
                   2667:     fast            -F option
                   2668:     full            -f option
                   2669:     interactive     -I option
                   2670:     lex-compat      -l option
                   2671:     meta-ecs        -Cm option
                   2672:     perf-report     -p option
                   2673:     read            -Cr option
                   2674:     stdout          -t option
                   2675:     verbose         -v option
                   2676:     warn            opposite of -w option
                   2677:                     (use "%option nowarn" for -w)
                   2678:
                   2679:     array           equivalent to "%array"
                   2680:     pointer         equivalent to "%pointer" (default)
                   2681:
                   2682: .fi
                   2683: Some
                   2684: .B %option's
                   2685: provide features otherwise not available:
                   2686: .TP
                   2687: .B always-interactive
                   2688: instructs flex to generate a scanner which always considers its input
                   2689: "interactive".  Normally, on each new input file the scanner calls
                   2690: .B isatty()
                   2691: in an attempt to determine whether
                   2692: the scanner's input source is interactive and thus should be read a
                   2693: character at a time.  When this option is used, however, then no
                   2694: such call is made.
                   2695: .TP
                   2696: .B main
                   2697: directs flex to provide a default
                   2698: .B main()
                   2699: program for the scanner, which simply calls
                   2700: .B yylex().
                   2701: This option implies
                   2702: .B noyywrap
                   2703: (see below).
                   2704: .TP
                   2705: .B never-interactive
                   2706: instructs flex to generate a scanner which never considers its input
                   2707: "interactive" (again, no call made to
                   2708: .B isatty()).
                   2709: This is the opposite of
                   2710: .B always-interactive.
                   2711: .TP
                   2712: .B stack
                   2713: enables the use of start condition stacks (see Start Conditions above).
                   2714: .TP
                   2715: .B stdinit
                   2716: if set (i.e.,
                   2717: .B %option stdinit)
                   2718: initializes
                   2719: .I yyin
                   2720: and
                   2721: .I yyout
                   2722: to
                   2723: .I stdin
                   2724: and
                   2725: .I stdout,
                   2726: instead of the default of
                   2727: .I nil.
                   2728: Some existing
                   2729: .I lex
                   2730: programs depend on this behavior, even though it is not compliant with
                   2731: ANSI C, which does not require
                   2732: .I stdin
                   2733: and
                   2734: .I stdout
                   2735: to be compile-time constant.
                   2736: .TP
                   2737: .B yylineno
                   2738: directs
                   2739: .I flex
                   2740: to generate a scanner that maintains the number of the current line
                   2741: read from its input in the global variable
                   2742: .B yylineno.
                   2743: This option is implied by
                   2744: .B %option lex-compat.
                   2745: .TP
                   2746: .B yywrap
                   2747: if unset (i.e.,
                   2748: .B %option noyywrap),
                   2749: makes the scanner not call
                   2750: .B yywrap()
                   2751: upon an end-of-file, but simply assume that there are no more
                   2752: files to scan (until the user points
                   2753: .I yyin
                   2754: at a new file and calls
                   2755: .B yylex()
                   2756: again).
                   2757: .PP
                   2758: .I flex
                   2759: scans your rule actions to determine whether you use the
                   2760: .B REJECT
                   2761: or
                   2762: .B yymore()
                   2763: features.  The
                   2764: .B reject
                   2765: and
                   2766: .B yymore
                   2767: options are available to override its decision as to whether you use the
                   2768: options, either by setting them (e.g.,
                   2769: .B %option reject)
                   2770: to indicate the feature is indeed used, or
                   2771: unsetting them to indicate it actually is not used
                   2772: (e.g.,
                   2773: .B %option noyymore).
                   2774: .PP
                   2775: Three options take string-delimited values, offset with '=':
                   2776: .nf
                   2777:
                   2778:     %option outfile="ABC"
                   2779:
                   2780: .fi
                   2781: is equivalent to
                   2782: .B -oABC,
                   2783: and
                   2784: .nf
                   2785:
                   2786:     %option prefix="XYZ"
                   2787:
                   2788: .fi
                   2789: is equivalent to
                   2790: .B -PXYZ.
                   2791: Finally,
                   2792: .nf
                   2793:
                   2794:     %option yyclass="foo"
                   2795:
                   2796: .fi
                   2797: only applies when generating a C++ scanner (
                   2798: .B \-+
                   2799: option).  It informs
                   2800: .I flex
                   2801: that you have derived
                   2802: .B foo
                   2803: as a subclass of
                   2804: .B yyFlexLexer,
                   2805: so
                   2806: .I flex
                   2807: will place your actions in the member function
                   2808: .B foo::yylex()
                   2809: instead of
                   2810: .B yyFlexLexer::yylex().
                   2811: It also generates a
                   2812: .B yyFlexLexer::yylex()
                   2813: member function that emits a run-time error (by invoking
                   2814: .B yyFlexLexer::LexerError())
                   2815: if called.
                   2816: See Generating C++ Scanners, below, for additional information.
                   2817: .PP
                   2818: A number of options are available for lint purists who want to suppress
                   2819: the appearance of unneeded routines in the generated scanner.  Each of the
                   2820: following, if unset
                   2821: (e.g.,
                   2822: .B %option nounput
                   2823: ), results in the corresponding routine not appearing in
                   2824: the generated scanner:
                   2825: .nf
                   2826:
                   2827:     input, unput
                   2828:     yy_push_state, yy_pop_state, yy_top_state
                   2829:     yy_scan_buffer, yy_scan_bytes, yy_scan_string
                   2830:
                   2831: .fi
                   2832: (though
                   2833: .B yy_push_state()
                   2834: and friends won't appear anyway unless you use
                   2835: .B %option stack).
                   2836: .SH PERFORMANCE CONSIDERATIONS
                   2837: The main design goal of
                   2838: .I flex
                   2839: is that it generate high-performance scanners.  It has been optimized
                   2840: for dealing well with large sets of rules.  Aside from the effects on
                   2841: scanner speed of the table compression
                   2842: .B \-C
                   2843: options outlined above,
                   2844: there are a number of options/actions which degrade performance.  These
                   2845: are, from most expensive to least:
                   2846: .nf
                   2847:
                   2848:     REJECT
                   2849:     %option yylineno
                   2850:     arbitrary trailing context
                   2851:
                   2852:     pattern sets that require backing up
                   2853:     %array
                   2854:     %option interactive
                   2855:     %option always-interactive
                   2856:
                   2857:     '^' beginning-of-line operator
                   2858:     yymore()
                   2859:
                   2860: .fi
                   2861: with the first three all being quite expensive and the last two
                   2862: being quite cheap.  Note also that
                   2863: .B unput()
                   2864: is implemented as a routine call that potentially does quite a bit of
                   2865: work, while
                   2866: .B yyless()
                   2867: is a quite-cheap macro; so if just putting back some excess text you
                   2868: scanned, use
                   2869: .B yyless().
                   2870: .PP
                   2871: .B REJECT
                   2872: should be avoided at all costs when performance is important.
                   2873: It is a particularly expensive option.
                   2874: .PP
                   2875: Getting rid of backing up is messy and often may be an enormous
                   2876: amount of work for a complicated scanner.  In principal, one begins
                   2877: by using the
                   2878: .B \-b
                   2879: flag to generate a
                   2880: .I lex.backup
                   2881: file.  For example, on the input
                   2882: .nf
                   2883:
                   2884:     %%
                   2885:     foo        return TOK_KEYWORD;
                   2886:     foobar     return TOK_KEYWORD;
                   2887:
                   2888: .fi
                   2889: the file looks like:
                   2890: .nf
                   2891:
                   2892:     State #6 is non-accepting -
                   2893:      associated rule line numbers:
                   2894:            2       3
                   2895:      out-transitions: [ o ]
                   2896:      jam-transitions: EOF [ \\001-n  p-\\177 ]
                   2897:
                   2898:     State #8 is non-accepting -
                   2899:      associated rule line numbers:
                   2900:            3
                   2901:      out-transitions: [ a ]
                   2902:      jam-transitions: EOF [ \\001-`  b-\\177 ]
                   2903:
                   2904:     State #9 is non-accepting -
                   2905:      associated rule line numbers:
                   2906:            3
                   2907:      out-transitions: [ r ]
                   2908:      jam-transitions: EOF [ \\001-q  s-\\177 ]
                   2909:
                   2910:     Compressed tables always back up.
                   2911:
                   2912: .fi
                   2913: The first few lines tell us that there's a scanner state in
                   2914: which it can make a transition on an 'o' but not on any other
                   2915: character, and that in that state the currently scanned text does not match
                   2916: any rule.  The state occurs when trying to match the rules found
                   2917: at lines 2 and 3 in the input file.
                   2918: If the scanner is in that state and then reads
                   2919: something other than an 'o', it will have to back up to find
                   2920: a rule which is matched.  With
                   2921: a bit of headscratching one can see that this must be the
                   2922: state it's in when it has seen "fo".  When this has happened,
                   2923: if anything other than another 'o' is seen, the scanner will
                   2924: have to back up to simply match the 'f' (by the default rule).
                   2925: .PP
                   2926: The comment regarding State #8 indicates there's a problem
                   2927: when "foob" has been scanned.  Indeed, on any character other
                   2928: than an 'a', the scanner will have to back up to accept "foo".
                   2929: Similarly, the comment for State #9 concerns when "fooba" has
                   2930: been scanned and an 'r' does not follow.
                   2931: .PP
                   2932: The final comment reminds us that there's no point going to
                   2933: all the trouble of removing backing up from the rules unless
                   2934: we're using
                   2935: .B \-Cf
                   2936: or
                   2937: .B \-CF,
                   2938: since there's no performance gain doing so with compressed scanners.
                   2939: .PP
                   2940: The way to remove the backing up is to add "error" rules:
                   2941: .nf
                   2942:
                   2943:     %%
                   2944:     foo         return TOK_KEYWORD;
                   2945:     foobar      return TOK_KEYWORD;
                   2946:
                   2947:     fooba       |
                   2948:     foob        |
                   2949:     fo          {
                   2950:                 /* false alarm, not really a keyword */
                   2951:                 return TOK_ID;
                   2952:                 }
                   2953:
                   2954: .fi
                   2955: .PP
                   2956: Eliminating backing up among a list of keywords can also be
                   2957: done using a "catch-all" rule:
                   2958: .nf
                   2959:
                   2960:     %%
                   2961:     foo         return TOK_KEYWORD;
                   2962:     foobar      return TOK_KEYWORD;
                   2963:
                   2964:     [a-z]+      return TOK_ID;
                   2965:
                   2966: .fi
                   2967: This is usually the best solution when appropriate.
                   2968: .PP
                   2969: Backing up messages tend to cascade.
                   2970: With a complicated set of rules it's not uncommon to get hundreds
                   2971: of messages.  If one can decipher them, though, it often
                   2972: only takes a dozen or so rules to eliminate the backing up (though
                   2973: it's easy to make a mistake and have an error rule accidentally match
                   2974: a valid token.  A possible future
                   2975: .I flex
                   2976: feature will be to automatically add rules to eliminate backing up).
                   2977: .PP
                   2978: It's important to keep in mind that you gain the benefits of eliminating
                   2979: backing up only if you eliminate
                   2980: .I every
                   2981: instance of backing up.  Leaving just one means you gain nothing.
                   2982: .PP
                   2983: .I Variable
                   2984: trailing context (where both the leading and trailing parts do not have
                   2985: a fixed length) entails almost the same performance loss as
                   2986: .B REJECT
                   2987: (i.e., substantial).  So when possible a rule like:
                   2988: .nf
                   2989:
                   2990:     %%
                   2991:     mouse|rat/(cat|dog)   run();
                   2992:
                   2993: .fi
                   2994: is better written:
                   2995: .nf
                   2996:
                   2997:     %%
                   2998:     mouse/cat|dog         run();
                   2999:     rat/cat|dog           run();
                   3000:
                   3001: .fi
                   3002: or as
                   3003: .nf
                   3004:
                   3005:     %%
                   3006:     mouse|rat/cat         run();
                   3007:     mouse|rat/dog         run();
                   3008:
                   3009: .fi
                   3010: Note that here the special '|' action does
                   3011: .I not
                   3012: provide any savings, and can even make things worse (see
                   3013: Deficiencies / Bugs below).
                   3014: .LP
                   3015: Another area where the user can increase a scanner's performance
                   3016: (and one that's easier to implement) arises from the fact that
                   3017: the longer the tokens matched, the faster the scanner will run.
                   3018: This is because with long tokens the processing of most input
                   3019: characters takes place in the (short) inner scanning loop, and
                   3020: does not often have to go through the additional work of setting up
                   3021: the scanning environment (e.g.,
                   3022: .B yytext)
                   3023: for the action.  Recall the scanner for C comments:
                   3024: .nf
                   3025:
                   3026:     %x comment
                   3027:     %%
                   3028:             int line_num = 1;
                   3029:
                   3030:     "/*"         BEGIN(comment);
                   3031:
                   3032:     <comment>[^*\\n]*
                   3033:     <comment>"*"+[^*/\\n]*
                   3034:     <comment>\\n             ++line_num;
                   3035:     <comment>"*"+"/"        BEGIN(INITIAL);
                   3036:
                   3037: .fi
                   3038: This could be sped up by writing it as:
                   3039: .nf
                   3040:
                   3041:     %x comment
                   3042:     %%
                   3043:             int line_num = 1;
                   3044:
                   3045:     "/*"         BEGIN(comment);
                   3046:
                   3047:     <comment>[^*\\n]*
                   3048:     <comment>[^*\\n]*\\n      ++line_num;
                   3049:     <comment>"*"+[^*/\\n]*
                   3050:     <comment>"*"+[^*/\\n]*\\n ++line_num;
                   3051:     <comment>"*"+"/"        BEGIN(INITIAL);
                   3052:
                   3053: .fi
                   3054: Now instead of each newline requiring the processing of another
                   3055: action, recognizing the newlines is "distributed" over the other rules
                   3056: to keep the matched text as long as possible.  Note that
                   3057: .I adding
                   3058: rules does
                   3059: .I not
                   3060: slow down the scanner!  The speed of the scanner is independent
                   3061: of the number of rules or (modulo the considerations given at the
                   3062: beginning of this section) how complicated the rules are with
                   3063: regard to operators such as '*' and '|'.
                   3064: .PP
                   3065: A final example in speeding up a scanner: suppose you want to scan
                   3066: through a file containing identifiers and keywords, one per line
                   3067: and with no other extraneous characters, and recognize all the
                   3068: keywords.  A natural first approach is:
                   3069: .nf
                   3070:
                   3071:     %%
                   3072:     asm      |
                   3073:     auto     |
                   3074:     break    |
                   3075:     ... etc ...
                   3076:     volatile |
                   3077:     while    /* it's a keyword */
                   3078:
                   3079:     .|\\n     /* it's not a keyword */
                   3080:
                   3081: .fi
                   3082: To eliminate the back-tracking, introduce a catch-all rule:
                   3083: .nf
                   3084:
                   3085:     %%
                   3086:     asm      |
                   3087:     auto     |
                   3088:     break    |
                   3089:     ... etc ...
                   3090:     volatile |
                   3091:     while    /* it's a keyword */
                   3092:
                   3093:     [a-z]+   |
                   3094:     .|\\n     /* it's not a keyword */
                   3095:
                   3096: .fi
                   3097: Now, if it's guaranteed that there's exactly one word per line,
                   3098: then we can reduce the total number of matches by a half by
                   3099: merging in the recognition of newlines with that of the other
                   3100: tokens:
                   3101: .nf
                   3102:
                   3103:     %%
                   3104:     asm\\n    |
                   3105:     auto\\n   |
                   3106:     break\\n  |
                   3107:     ... etc ...
                   3108:     volatile\\n |
                   3109:     while\\n  /* it's a keyword */
                   3110:
                   3111:     [a-z]+\\n |
                   3112:     .|\\n     /* it's not a keyword */
                   3113:
                   3114: .fi
                   3115: One has to be careful here, as we have now reintroduced backing up
                   3116: into the scanner.  In particular, while
                   3117: .I we
                   3118: know that there will never be any characters in the input stream
                   3119: other than letters or newlines,
                   3120: .I flex
                   3121: can't figure this out, and it will plan for possibly needing to back up
                   3122: when it has scanned a token like "auto" and then the next character
                   3123: is something other than a newline or a letter.  Previously it would
                   3124: then just match the "auto" rule and be done, but now it has no "auto"
                   3125: rule, only a "auto\\n" rule.  To eliminate the possibility of backing up,
                   3126: we could either duplicate all rules but without final newlines, or,
                   3127: since we never expect to encounter such an input and therefore don't
                   3128: how it's classified, we can introduce one more catch-all rule, this
                   3129: one which doesn't include a newline:
                   3130: .nf
                   3131:
                   3132:     %%
                   3133:     asm\\n    |
                   3134:     auto\\n   |
                   3135:     break\\n  |
                   3136:     ... etc ...
                   3137:     volatile\\n |
                   3138:     while\\n  /* it's a keyword */
                   3139:
                   3140:     [a-z]+\\n |
                   3141:     [a-z]+   |
                   3142:     .|\\n     /* it's not a keyword */
                   3143:
                   3144: .fi
                   3145: Compiled with
                   3146: .B \-Cf,
                   3147: this is about as fast as one can get a
                   3148: .I flex
                   3149: scanner to go for this particular problem.
                   3150: .PP
                   3151: A final note:
                   3152: .I flex
                   3153: is slow when matching NUL's, particularly when a token contains
                   3154: multiple NUL's.
                   3155: It's best to write rules which match
                   3156: .I short
                   3157: amounts of text if it's anticipated that the text will often include NUL's.
                   3158: .PP
                   3159: Another final note regarding performance: as mentioned above in the section
                   3160: How the Input is Matched, dynamically resizing
                   3161: .B yytext
                   3162: to accommodate huge tokens is a slow process because it presently requires that
                   3163: the (huge) token be rescanned from the beginning.  Thus if performance is
                   3164: vital, you should attempt to match "large" quantities of text but not
                   3165: "huge" quantities, where the cutoff between the two is at about 8K
                   3166: characters/token.
                   3167: .SH GENERATING C++ SCANNERS
                   3168: .I flex
                   3169: provides two different ways to generate scanners for use with C++.  The
                   3170: first way is to simply compile a scanner generated by
                   3171: .I flex
                   3172: using a C++ compiler instead of a C compiler.  You should not encounter
                   3173: any compilations errors (please report any you find to the email address
                   3174: given in the Author section below).  You can then use C++ code in your
                   3175: rule actions instead of C code.  Note that the default input source for
                   3176: your scanner remains
                   3177: .I yyin,
                   3178: and default echoing is still done to
                   3179: .I yyout.
                   3180: Both of these remain
                   3181: .I FILE *
                   3182: variables and not C++
                   3183: .I streams.
                   3184: .PP
                   3185: You can also use
                   3186: .I flex
                   3187: to generate a C++ scanner class, using the
                   3188: .B \-+
                   3189: option (or, equivalently,
                   3190: .B %option c++),
                   3191: which is automatically specified if the name of the flex
                   3192: executable ends in a '+', such as
                   3193: .I flex++.
                   3194: When using this option, flex defaults to generating the scanner to the file
                   3195: .B lex.yy.cc
                   3196: instead of
                   3197: .B lex.yy.c.
                   3198: The generated scanner includes the header file
1.5       deraadt  3199: .I g++/FlexLexer.h,
1.1       deraadt  3200: which defines the interface to two C++ classes.
                   3201: .PP
                   3202: The first class,
                   3203: .B FlexLexer,
                   3204: provides an abstract base class defining the general scanner class
                   3205: interface.  It provides the following member functions:
                   3206: .TP
                   3207: .B const char* YYText()
                   3208: returns the text of the most recently matched token, the equivalent of
                   3209: .B yytext.
                   3210: .TP
                   3211: .B int YYLeng()
                   3212: returns the length of the most recently matched token, the equivalent of
                   3213: .B yyleng.
                   3214: .TP
                   3215: .B int lineno() const
                   3216: returns the current input line number
                   3217: (see
                   3218: .B %option yylineno),
                   3219: or
                   3220: .B 1
                   3221: if
                   3222: .B %option yylineno
                   3223: was not used.
                   3224: .TP
                   3225: .B void set_debug( int flag )
                   3226: sets the debugging flag for the scanner, equivalent to assigning to
                   3227: .B yy_flex_debug
                   3228: (see the Options section above).  Note that you must build the scanner
                   3229: using
                   3230: .B %option debug
                   3231: to include debugging information in it.
                   3232: .TP
                   3233: .B int debug() const
                   3234: returns the current setting of the debugging flag.
                   3235: .PP
                   3236: Also provided are member functions equivalent to
                   3237: .B yy_switch_to_buffer(),
                   3238: .B yy_create_buffer()
                   3239: (though the first argument is an
                   3240: .B istream*
                   3241: object pointer and not a
                   3242: .B FILE*),
                   3243: .B yy_flush_buffer(),
                   3244: .B yy_delete_buffer(),
                   3245: and
                   3246: .B yyrestart()
                   3247: (again, the first argument is a
                   3248: .B istream*
                   3249: object pointer).
                   3250: .PP
                   3251: The second class defined in
1.5       deraadt  3252: .I g++/FlexLexer.h
1.1       deraadt  3253: is
                   3254: .B yyFlexLexer,
                   3255: which is derived from
                   3256: .B FlexLexer.
                   3257: It defines the following additional member functions:
                   3258: .TP
                   3259: .B
                   3260: yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
                   3261: constructs a
                   3262: .B yyFlexLexer
                   3263: object using the given streams for input and output.  If not specified,
                   3264: the streams default to
                   3265: .B cin
                   3266: and
                   3267: .B cout,
                   3268: respectively.
                   3269: .TP
                   3270: .B virtual int yylex()
                   3271: performs the same role is
                   3272: .B yylex()
                   3273: does for ordinary flex scanners: it scans the input stream, consuming
                   3274: tokens, until a rule's action returns a value.  If you derive a subclass
                   3275: .B S
                   3276: from
                   3277: .B yyFlexLexer
                   3278: and want to access the member functions and variables of
                   3279: .B S
                   3280: inside
                   3281: .B yylex(),
                   3282: then you need to use
                   3283: .B %option yyclass="S"
                   3284: to inform
                   3285: .I flex
                   3286: that you will be using that subclass instead of
                   3287: .B yyFlexLexer.
                   3288: In this case, rather than generating
                   3289: .B yyFlexLexer::yylex(),
                   3290: .I flex
                   3291: generates
                   3292: .B S::yylex()
                   3293: (and also generates a dummy
                   3294: .B yyFlexLexer::yylex()
                   3295: that calls
                   3296: .B yyFlexLexer::LexerError()
                   3297: if called).
                   3298: .TP
                   3299: .B
                   3300: virtual void switch_streams(istream* new_in = 0,
                   3301: .B
                   3302: ostream* new_out = 0)
                   3303: reassigns
                   3304: .B yyin
                   3305: to
                   3306: .B new_in
                   3307: (if non-nil)
                   3308: and
                   3309: .B yyout
                   3310: to
                   3311: .B new_out
                   3312: (ditto), deleting the previous input buffer if
                   3313: .B yyin
                   3314: is reassigned.
                   3315: .TP
                   3316: .B
                   3317: int yylex( istream* new_in, ostream* new_out = 0 )
                   3318: first switches the input streams via
                   3319: .B switch_streams( new_in, new_out )
                   3320: and then returns the value of
                   3321: .B yylex().
                   3322: .PP
                   3323: In addition,
                   3324: .B yyFlexLexer
                   3325: defines the following protected virtual functions which you can redefine
                   3326: in derived classes to tailor the scanner:
                   3327: .TP
                   3328: .B
                   3329: virtual int LexerInput( char* buf, int max_size )
                   3330: reads up to
                   3331: .B max_size
                   3332: characters into
                   3333: .B buf
                   3334: and returns the number of characters read.  To indicate end-of-input,
                   3335: return 0 characters.  Note that "interactive" scanners (see the
                   3336: .B \-B
                   3337: and
                   3338: .B \-I
                   3339: flags) define the macro
                   3340: .B YY_INTERACTIVE.
                   3341: If you redefine
                   3342: .B LexerInput()
                   3343: and need to take different actions depending on whether or not
                   3344: the scanner might be scanning an interactive input source, you can
                   3345: test for the presence of this name via
                   3346: .B #ifdef.
                   3347: .TP
                   3348: .B
                   3349: virtual void LexerOutput( const char* buf, int size )
                   3350: writes out
                   3351: .B size
                   3352: characters from the buffer
                   3353: .B buf,
                   3354: which, while NUL-terminated, may also contain "internal" NUL's if
                   3355: the scanner's rules can match text with NUL's in them.
                   3356: .TP
                   3357: .B
                   3358: virtual void LexerError( const char* msg )
                   3359: reports a fatal error message.  The default version of this function
                   3360: writes the message to the stream
                   3361: .B cerr
                   3362: and exits.
                   3363: .PP
                   3364: Note that a
                   3365: .B yyFlexLexer
                   3366: object contains its
                   3367: .I entire
                   3368: scanning state.  Thus you can use such objects to create reentrant
                   3369: scanners.  You can instantiate multiple instances of the same
                   3370: .B yyFlexLexer
                   3371: class, and you can also combine multiple C++ scanner classes together
                   3372: in the same program using the
                   3373: .B \-P
                   3374: option discussed above.
                   3375: .PP
                   3376: Finally, note that the
                   3377: .B %array
                   3378: feature is not available to C++ scanner classes; you must use
                   3379: .B %pointer
                   3380: (the default).
                   3381: .PP
                   3382: Here is an example of a simple C++ scanner:
                   3383: .nf
                   3384:
                   3385:         // An example of using the flex C++ scanner class.
                   3386:
                   3387:     %{
                   3388:     int mylineno = 0;
                   3389:     %}
                   3390:
                   3391:     string  \\"[^\\n"]+\\"
                   3392:
                   3393:     ws      [ \\t]+
                   3394:
                   3395:     alpha   [A-Za-z]
                   3396:     dig     [0-9]
                   3397:     name    ({alpha}|{dig}|\\$)({alpha}|{dig}|[_.\\-/$])*
                   3398:     num1    [-+]?{dig}+\\.?([eE][-+]?{dig}+)?
                   3399:     num2    [-+]?{dig}*\\.{dig}+([eE][-+]?{dig}+)?
                   3400:     number  {num1}|{num2}
                   3401:
                   3402:     %%
                   3403:
                   3404:     {ws}    /* skip blanks and tabs */
                   3405:
                   3406:     "/*"    {
                   3407:             int c;
                   3408:
                   3409:             while((c = yyinput()) != 0)
                   3410:                 {
                   3411:                 if(c == '\\n')
                   3412:                     ++mylineno;
                   3413:
                   3414:                 else if(c == '*')
                   3415:                     {
                   3416:                     if((c = yyinput()) == '/')
                   3417:                         break;
                   3418:                     else
                   3419:                         unput(c);
                   3420:                     }
                   3421:                 }
                   3422:             }
                   3423:
                   3424:     {number}  cout << "number " << YYText() << '\\n';
                   3425:
                   3426:     \\n        mylineno++;
                   3427:
                   3428:     {name}    cout << "name " << YYText() << '\\n';
                   3429:
                   3430:     {string}  cout << "string " << YYText() << '\\n';
                   3431:
                   3432:     %%
                   3433:
                   3434:     int main( int /* argc */, char** /* argv */ )
                   3435:         {
                   3436:         FlexLexer* lexer = new yyFlexLexer;
                   3437:         while(lexer->yylex() != 0)
                   3438:             ;
                   3439:         return 0;
                   3440:         }
                   3441: .fi
                   3442: If you want to create multiple (different) lexer classes, you use the
                   3443: .B \-P
                   3444: flag (or the
                   3445: .B prefix=
                   3446: option) to rename each
                   3447: .B yyFlexLexer
                   3448: to some other
                   3449: .B xxFlexLexer.
                   3450: You then can include
1.5       deraadt  3451: .B <g++/FlexLexer.h>
1.1       deraadt  3452: in your other sources once per lexer class, first renaming
                   3453: .B yyFlexLexer
                   3454: as follows:
                   3455: .nf
                   3456:
                   3457:     #undef yyFlexLexer
                   3458:     #define yyFlexLexer xxFlexLexer
1.5       deraadt  3459:     #include <g++/FlexLexer.h>
1.1       deraadt  3460:
                   3461:     #undef yyFlexLexer
                   3462:     #define yyFlexLexer zzFlexLexer
1.5       deraadt  3463:     #include <g++/FlexLexer.h>
1.1       deraadt  3464:
                   3465: .fi
                   3466: if, for example, you used
                   3467: .B %option prefix="xx"
                   3468: for one of your scanners and
                   3469: .B %option prefix="zz"
                   3470: for the other.
                   3471: .PP
                   3472: IMPORTANT: the present form of the scanning class is
                   3473: .I experimental
                   3474: and may change considerably between major releases.
                   3475: .SH INCOMPATIBILITIES WITH LEX AND POSIX
                   3476: .I flex
                   3477: is a rewrite of the AT&T Unix
                   3478: .I lex
                   3479: tool (the two implementations do not share any code, though),
                   3480: with some extensions and incompatibilities, both of which
                   3481: are of concern to those who wish to write scanners acceptable
                   3482: to either implementation.  Flex is fully compliant with the POSIX
                   3483: .I lex
                   3484: specification, except that when using
                   3485: .B %pointer
                   3486: (the default), a call to
                   3487: .B unput()
                   3488: destroys the contents of
                   3489: .B yytext,
                   3490: which is counter to the POSIX specification.
                   3491: .PP
                   3492: In this section we discuss all of the known areas of incompatibility
                   3493: between flex, AT&T lex, and the POSIX specification.
                   3494: .PP
                   3495: .I flex's
                   3496: .B \-l
                   3497: option turns on maximum compatibility with the original AT&T
                   3498: .I lex
                   3499: implementation, at the cost of a major loss in the generated scanner's
                   3500: performance.  We note below which incompatibilities can be overcome
                   3501: using the
                   3502: .B \-l
                   3503: option.
                   3504: .PP
                   3505: .I flex
                   3506: is fully compatible with
                   3507: .I lex
                   3508: with the following exceptions:
                   3509: .IP -
                   3510: The undocumented
                   3511: .I lex
                   3512: scanner internal variable
                   3513: .B yylineno
                   3514: is not supported unless
                   3515: .B \-l
                   3516: or
                   3517: .B %option yylineno
                   3518: is used.
                   3519: .IP
                   3520: .B yylineno
                   3521: should be maintained on a per-buffer basis, rather than a per-scanner
                   3522: (single global variable) basis.
                   3523: .IP
                   3524: .B yylineno
                   3525: is not part of the POSIX specification.
                   3526: .IP -
                   3527: The
                   3528: .B input()
                   3529: routine is not redefinable, though it may be called to read characters
                   3530: following whatever has been matched by a rule.  If
                   3531: .B input()
                   3532: encounters an end-of-file the normal
                   3533: .B yywrap()
                   3534: processing is done.  A ``real'' end-of-file is returned by
                   3535: .B input()
                   3536: as
                   3537: .I EOF.
                   3538: .IP
                   3539: Input is instead controlled by defining the
                   3540: .B YY_INPUT
                   3541: macro.
                   3542: .IP
                   3543: The
                   3544: .I flex
                   3545: restriction that
                   3546: .B input()
                   3547: cannot be redefined is in accordance with the POSIX specification,
                   3548: which simply does not specify any way of controlling the
                   3549: scanner's input other than by making an initial assignment to
                   3550: .I yyin.
                   3551: .IP -
                   3552: The
                   3553: .B unput()
                   3554: routine is not redefinable.  This restriction is in accordance with POSIX.
                   3555: .IP -
                   3556: .I flex
                   3557: scanners are not as reentrant as
                   3558: .I lex
                   3559: scanners.  In particular, if you have an interactive scanner and
                   3560: an interrupt handler which long-jumps out of the scanner, and
                   3561: the scanner is subsequently called again, you may get the following
                   3562: message:
                   3563: .nf
                   3564:
                   3565:     fatal flex scanner internal error--end of buffer missed
                   3566:
                   3567: .fi
                   3568: To reenter the scanner, first use
                   3569: .nf
                   3570:
                   3571:     yyrestart( yyin );
                   3572:
                   3573: .fi
                   3574: Note that this call will throw away any buffered input; usually this
                   3575: isn't a problem with an interactive scanner.
                   3576: .IP
                   3577: Also note that flex C++ scanner classes
                   3578: .I are
                   3579: reentrant, so if using C++ is an option for you, you should use
                   3580: them instead.  See "Generating C++ Scanners" above for details.
                   3581: .IP -
                   3582: .B output()
                   3583: is not supported.
                   3584: Output from the
                   3585: .B ECHO
                   3586: macro is done to the file-pointer
                   3587: .I yyout
                   3588: (default
                   3589: .I stdout).
                   3590: .IP
                   3591: .B output()
                   3592: is not part of the POSIX specification.
                   3593: .IP -
                   3594: .I lex
                   3595: does not support exclusive start conditions (%x), though they
                   3596: are in the POSIX specification.
                   3597: .IP -
                   3598: When definitions are expanded,
                   3599: .I flex
                   3600: encloses them in parentheses.
                   3601: With lex, the following:
                   3602: .nf
                   3603:
                   3604:     NAME    [A-Z][A-Z0-9]*
                   3605:     %%
                   3606:     foo{NAME}?      printf( "Found it\\n" );
                   3607:     %%
                   3608:
                   3609: .fi
                   3610: will not match the string "foo" because when the macro
                   3611: is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
                   3612: and the precedence is such that the '?' is associated with
                   3613: "[A-Z0-9]*".  With
                   3614: .I flex,
                   3615: the rule will be expanded to
                   3616: "foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
                   3617: .IP
                   3618: Note that if the definition begins with
                   3619: .B ^
                   3620: or ends with
                   3621: .B $
                   3622: then it is
                   3623: .I not
                   3624: expanded with parentheses, to allow these operators to appear in
                   3625: definitions without losing their special meanings.  But the
                   3626: .B <s>, /,
                   3627: and
                   3628: .B <<EOF>>
                   3629: operators cannot be used in a
                   3630: .I flex
                   3631: definition.
                   3632: .IP
                   3633: Using
                   3634: .B \-l
                   3635: results in the
                   3636: .I lex
                   3637: behavior of no parentheses around the definition.
                   3638: .IP
                   3639: The POSIX specification is that the definition be enclosed in parentheses.
                   3640: .IP -
                   3641: Some implementations of
                   3642: .I lex
                   3643: allow a rule's action to begin on a separate line, if the rule's pattern
                   3644: has trailing whitespace:
                   3645: .nf
                   3646:
                   3647:     %%
                   3648:     foo|bar<space here>
                   3649:       { foobar_action(); }
                   3650:
                   3651: .fi
                   3652: .I flex
                   3653: does not support this feature.
                   3654: .IP -
                   3655: The
                   3656: .I lex
                   3657: .B %r
                   3658: (generate a Ratfor scanner) option is not supported.  It is not part
                   3659: of the POSIX specification.
                   3660: .IP -
                   3661: After a call to
                   3662: .B unput(),
                   3663: .I yytext
                   3664: is undefined until the next token is matched, unless the scanner
                   3665: was built using
                   3666: .B %array.
                   3667: This is not the case with
                   3668: .I lex
                   3669: or the POSIX specification.  The
                   3670: .B \-l
                   3671: option does away with this incompatibility.
                   3672: .IP -
                   3673: The precedence of the
                   3674: .B {}
                   3675: (numeric range) operator is different.
                   3676: .I lex
                   3677: interprets "abc{1,3}" as "match one, two, or
                   3678: three occurrences of 'abc'", whereas
                   3679: .I flex
                   3680: interprets it as "match 'ab'
                   3681: followed by one, two, or three occurrences of 'c'".  The latter is
                   3682: in agreement with the POSIX specification.
                   3683: .IP -
                   3684: The precedence of the
                   3685: .B ^
                   3686: operator is different.
                   3687: .I lex
                   3688: interprets "^foo|bar" as "match either 'foo' at the beginning of a line,
                   3689: or 'bar' anywhere", whereas
                   3690: .I flex
                   3691: interprets it as "match either 'foo' or 'bar' if they come at the beginning
                   3692: of a line".  The latter is in agreement with the POSIX specification.
                   3693: .IP -
                   3694: The special table-size declarations such as
                   3695: .B %a
                   3696: supported by
                   3697: .I lex
                   3698: are not required by
                   3699: .I flex
                   3700: scanners;
                   3701: .I flex
                   3702: ignores them.
                   3703: .IP -
                   3704: The name
                   3705: .bd
                   3706: FLEX_SCANNER
                   3707: is #define'd so scanners may be written for use with either
                   3708: .I flex
                   3709: or
                   3710: .I lex.
                   3711: Scanners also include
                   3712: .B YY_FLEX_MAJOR_VERSION
                   3713: and
                   3714: .B YY_FLEX_MINOR_VERSION
                   3715: indicating which version of
                   3716: .I flex
                   3717: generated the scanner
                   3718: (for example, for the 2.5 release, these defines would be 2 and 5
                   3719: respectively).
                   3720: .PP
                   3721: The following
                   3722: .I flex
                   3723: features are not included in
                   3724: .I lex
                   3725: or the POSIX specification:
                   3726: .nf
                   3727:
                   3728:     C++ scanners
                   3729:     %option
                   3730:     start condition scopes
                   3731:     start condition stacks
                   3732:     interactive/non-interactive scanners
                   3733:     yy_scan_string() and friends
                   3734:     yyterminate()
                   3735:     yy_set_interactive()
                   3736:     yy_set_bol()
                   3737:     YY_AT_BOL()
                   3738:     <<EOF>>
                   3739:     <*>
                   3740:     YY_DECL
                   3741:     YY_START
                   3742:     YY_USER_ACTION
                   3743:     YY_USER_INIT
                   3744:     #line directives
                   3745:     %{}'s around actions
                   3746:     multiple actions on a line
                   3747:
                   3748: .fi
                   3749: plus almost all of the flex flags.
                   3750: The last feature in the list refers to the fact that with
                   3751: .I flex
                   3752: you can put multiple actions on the same line, separated with
                   3753: semi-colons, while with
                   3754: .I lex,
                   3755: the following
                   3756: .nf
                   3757:
                   3758:     foo    handle_foo(); ++num_foos_seen;
                   3759:
                   3760: .fi
                   3761: is (rather surprisingly) truncated to
                   3762: .nf
                   3763:
                   3764:     foo    handle_foo();
                   3765:
                   3766: .fi
                   3767: .I flex
                   3768: does not truncate the action.  Actions that are not enclosed in
                   3769: braces are simply terminated at the end of the line.
                   3770: .SH DIAGNOSTICS
                   3771: .PP
                   3772: .I warning, rule cannot be matched
                   3773: indicates that the given rule
                   3774: cannot be matched because it follows other rules that will
                   3775: always match the same text as it.  For
                   3776: example, in the following "foo" cannot be matched because it comes after
                   3777: an identifier "catch-all" rule:
                   3778: .nf
                   3779:
                   3780:     [a-z]+    got_identifier();
                   3781:     foo       got_foo();
                   3782:
                   3783: .fi
                   3784: Using
                   3785: .B REJECT
                   3786: in a scanner suppresses this warning.
                   3787: .PP
                   3788: .I warning,
                   3789: .B \-s
                   3790: .I
                   3791: option given but default rule can be matched
                   3792: means that it is possible (perhaps only in a particular start condition)
                   3793: that the default rule (match any single character) is the only one
                   3794: that will match a particular input.  Since
                   3795: .B \-s
                   3796: was given, presumably this is not intended.
                   3797: .PP
                   3798: .I reject_used_but_not_detected undefined
                   3799: or
                   3800: .I yymore_used_but_not_detected undefined -
                   3801: These errors can occur at compile time.  They indicate that the
                   3802: scanner uses
                   3803: .B REJECT
                   3804: or
                   3805: .B yymore()
                   3806: but that
                   3807: .I flex
                   3808: failed to notice the fact, meaning that
                   3809: .I flex
                   3810: scanned the first two sections looking for occurrences of these actions
                   3811: and failed to find any, but somehow you snuck some in (via a #include
                   3812: file, for example).  Use
                   3813: .B %option reject
                   3814: or
                   3815: .B %option yymore
                   3816: to indicate to flex that you really do use these features.
                   3817: .PP
                   3818: .I flex scanner jammed -
                   3819: a scanner compiled with
                   3820: .B \-s
                   3821: has encountered an input string which wasn't matched by
                   3822: any of its rules.  This error can also occur due to internal problems.
                   3823: .PP
                   3824: .I token too large, exceeds YYLMAX -
                   3825: your scanner uses
                   3826: .B %array
                   3827: and one of its rules matched a string longer than the
                   3828: .B YYLMAX
                   3829: constant (8K bytes by default).  You can increase the value by
                   3830: #define'ing
                   3831: .B YYLMAX
                   3832: in the definitions section of your
                   3833: .I flex
                   3834: input.
                   3835: .PP
                   3836: .I scanner requires \-8 flag to
                   3837: .I use the character 'x' -
                   3838: Your scanner specification includes recognizing the 8-bit character
                   3839: .I 'x'
                   3840: and you did not specify the \-8 flag, and your scanner defaulted to 7-bit
                   3841: because you used the
                   3842: .B \-Cf
                   3843: or
                   3844: .B \-CF
                   3845: table compression options.  See the discussion of the
                   3846: .B \-7
                   3847: flag for details.
                   3848: .PP
                   3849: .I flex scanner push-back overflow -
                   3850: you used
                   3851: .B unput()
                   3852: to push back so much text that the scanner's buffer could not hold
                   3853: both the pushed-back text and the current token in
                   3854: .B yytext.
                   3855: Ideally the scanner should dynamically resize the buffer in this case, but at
                   3856: present it does not.
                   3857: .PP
                   3858: .I
                   3859: input buffer overflow, can't enlarge buffer because scanner uses REJECT -
                   3860: the scanner was working on matching an extremely large token and needed
                   3861: to expand the input buffer.  This doesn't work with scanners that use
                   3862: .B
                   3863: REJECT.
                   3864: .PP
                   3865: .I
                   3866: fatal flex scanner internal error--end of buffer missed -
                   3867: This can occur in an scanner which is reentered after a long-jump
                   3868: has jumped out (or over) the scanner's activation frame.  Before
                   3869: reentering the scanner, use:
                   3870: .nf
                   3871:
                   3872:     yyrestart( yyin );
                   3873:
                   3874: .fi
                   3875: or, as noted above, switch to using the C++ scanner class.
                   3876: .PP
                   3877: .I too many start conditions in <> construct! -
                   3878: you listed more start conditions in a <> construct than exist (so
                   3879: you must have listed at least one of them twice).
                   3880: .SH FILES
                   3881: .TP
                   3882: .B \-lfl
                   3883: library with which scanners must be linked.
                   3884: .TP
                   3885: .I lex.yy.c
                   3886: generated scanner (called
                   3887: .I lexyy.c
                   3888: on some systems).
                   3889: .TP
                   3890: .I lex.yy.cc
                   3891: generated C++ scanner class, when using
                   3892: .B -+.
                   3893: .TP
1.5       deraadt  3894: .I <g++/FlexLexer.h>
1.1       deraadt  3895: header file defining the C++ scanner base class,
                   3896: .B FlexLexer,
                   3897: and its derived class,
                   3898: .B yyFlexLexer.
                   3899: .TP
                   3900: .I flex.skl
                   3901: skeleton scanner.  This file is only used when building flex, not when
                   3902: flex executes.
                   3903: .TP
                   3904: .I lex.backup
                   3905: backing-up information for
                   3906: .B \-b
                   3907: flag (called
                   3908: .I lex.bck
                   3909: on some systems).
                   3910: .SH DEFICIENCIES / BUGS
                   3911: .PP
                   3912: Some trailing context
                   3913: patterns cannot be properly matched and generate
                   3914: warning messages ("dangerous trailing context").  These are
                   3915: patterns where the ending of the
                   3916: first part of the rule matches the beginning of the second
                   3917: part, such as "zx*/xy*", where the 'x*' matches the 'x' at
                   3918: the beginning of the trailing context.  (Note that the POSIX draft
                   3919: states that the text matched by such patterns is undefined.)
                   3920: .PP
                   3921: For some trailing context rules, parts which are actually fixed-length are
1.3       deraadt  3922: not recognized as such, leading to the above mentioned performance loss.
1.1       deraadt  3923: In particular, parts using '|' or {n} (such as "foo{3}") are always
                   3924: considered variable-length.
                   3925: .PP
                   3926: Combining trailing context with the special '|' action can result in
                   3927: .I fixed
                   3928: trailing context being turned into the more expensive
                   3929: .I variable
                   3930: trailing context.  For example, in the following:
                   3931: .nf
                   3932:
                   3933:     %%
                   3934:     abc      |
                   3935:     xyz/def
                   3936:
                   3937: .fi
                   3938: .PP
                   3939: Use of
                   3940: .B unput()
                   3941: invalidates yytext and yyleng, unless the
                   3942: .B %array
                   3943: directive
                   3944: or the
                   3945: .B \-l
                   3946: option has been used.
                   3947: .PP
                   3948: Pattern-matching of NUL's is substantially slower than matching other
                   3949: characters.
                   3950: .PP
                   3951: Dynamic resizing of the input buffer is slow, as it entails rescanning
                   3952: all the text matched so far by the current (generally huge) token.
                   3953: .PP
                   3954: Due to both buffering of input and read-ahead, you cannot intermix
                   3955: calls to <stdio.h> routines, such as, for example,
                   3956: .B getchar(),
                   3957: with
                   3958: .I flex
                   3959: rules and expect it to work.  Call
                   3960: .B input()
                   3961: instead.
                   3962: .PP
                   3963: The total table entries listed by the
                   3964: .B \-v
                   3965: flag excludes the number of table entries needed to determine
                   3966: what rule has been matched.  The number of entries is equal
                   3967: to the number of DFA states if the scanner does not use
                   3968: .B REJECT,
                   3969: and somewhat greater than the number of states if it does.
                   3970: .PP
                   3971: .B REJECT
                   3972: cannot be used with the
                   3973: .B \-f
                   3974: or
                   3975: .B \-F
                   3976: options.
                   3977: .PP
                   3978: The
                   3979: .I flex
                   3980: internal algorithms need documentation.
                   3981: .SH SEE ALSO
                   3982: .PP
                   3983: lex(1), yacc(1), sed(1), awk(1).
                   3984: .PP
                   3985: John Levine, Tony Mason, and Doug Brown,
                   3986: .I Lex & Yacc,
                   3987: O'Reilly and Associates.  Be sure to get the 2nd edition.
                   3988: .PP
                   3989: M. E. Lesk and E. Schmidt,
                   3990: .I LEX \- Lexical Analyzer Generator
                   3991: .PP
                   3992: Alfred Aho, Ravi Sethi and Jeffrey Ullman,
                   3993: .I Compilers: Principles, Techniques and Tools,
                   3994: Addison-Wesley (1986).  Describes the pattern-matching techniques used by
                   3995: .I flex
                   3996: (deterministic finite automata).
                   3997: .SH AUTHOR
                   3998: Vern Paxson, with the help of many ideas and much inspiration from
                   3999: Van Jacobson.  Original version by Jef Poskanzer.  The fast table
                   4000: representation is a partial implementation of a design done by Van
                   4001: Jacobson.  The implementation was done by Kevin Gong and Vern Paxson.
                   4002: .PP
                   4003: Thanks to the many
                   4004: .I flex
                   4005: beta-testers, feedbackers, and contributors, especially Francois Pinard,
                   4006: Casey Leedom,
                   4007: Robert Abramovitz,
                   4008: Stan Adermann, Terry Allen, David Barker-Plummer, John Basrai,
                   4009: Neal Becker, Nelson H.F. Beebe, benson@odi.com,
                   4010: Karl Berry, Peter A. Bigot, Simon Blanchard,
                   4011: Keith Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick Christopher,
                   4012: Brian Clapper, J.T. Conklin,
                   4013: Jason Coughlin, Bill Cox, Nick Cropper, Dave Curtis, Scott David
                   4014: Daniels, Chris G. Demetriou, Theo Deraadt,
                   4015: Mike Donahue, Chuck Doucette, Tom Epperly, Leo Eskin,
                   4016: Chris Faylor, Chris Flatters, Jon Forrest, Jeffrey Friedl,
                   4017: Joe Gayda, Kaveh R. Ghazi, Wolfgang Glunz,
                   4018: Eric Goldman, Christopher M. Gould, Ulrich Grepel, Peer Griebel,
                   4019: Jan Hajic, Charles Hemphill, NORO Hideo,
                   4020: Jarkko Hietaniemi, Scott Hofmann,
                   4021: Jeff Honig, Dana Hudes, Eric Hughes, John Interrante,
                   4022: Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones,
                   4023: Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane,
                   4024: Amir Katz, ken@ken.hilco.com, Kevin B. Kenny,
                   4025: Steve Kirsch, Winfried Koenig, Marq Kole, Ronald Lamprecht,
                   4026: Greg Lee, Rohan Lenard, Craig Leres, John Levine, Steve Liddle,
                   4027: David Loffredo, Mike Long,
                   4028: Mohamed el Lozy, Brian Madsen, Malte, Joe Marshall,
                   4029: Bengt Martensson, Chris Metcalf,
                   4030: Luke Mewburn, Jim Meyering, R. Alexander Milowski, Erik Naggum,
                   4031: G.T. Nicol, Landon Noll, James Nordby, Marc Nozell,
                   4032: Richard Ohnemus, Karsten Pahnke,
                   4033: Sven Panne, Roland Pesch, Walter Pelissero, Gaumond
                   4034: Pierre, Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha,
                   4035: Frederic Raimbault, Pat Rankin, Rick Richardson,
                   4036: Kevin Rodgers, Kai Uwe Rommel, Jim Roskind, Alberto Santini,
                   4037: Andreas Scherer, Darrell Schiebel, Raf Schietekat,
                   4038: Doug Schmidt, Philippe Schnoebelen, Andreas Schwab,
                   4039: Larry Schwimmer, Alex Siegel, Eckehard Stolz, Jan-Erik Strvmquist,
                   4040: Mike Stump, Paul Stuart, Dave Tallman, Ian Lance Taylor,
                   4041: Chris Thewalt, Richard M. Timoney, Jodi Tsai,
                   4042: Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams, Ken
                   4043: Yap, Ron Zellar, Nathan Zelle, David Zuhn,
                   4044: and those whose names have slipped my marginal
                   4045: mail-archiving skills but whose contributions are appreciated all the
                   4046: same.
                   4047: .PP
                   4048: Thanks to Keith Bostic, Jon Forrest, Noah Friedman,
                   4049: John Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T.
                   4050: Nicol, Francois Pinard, Rich Salz, and Richard Stallman for help with various
                   4051: distribution headaches.
                   4052: .PP
                   4053: Thanks to Esmond Pitt and Earle Horton for 8-bit character support; to
                   4054: Benson Margulies and Fred Burke for C++ support; to Kent Williams and Tom
                   4055: Epperly for C++ class support; to Ove Ewerlid for support of NUL's; and to
                   4056: Eric Hughes for support of multiple buffers.
                   4057: .PP
                   4058: This work was primarily done when I was with the Real Time Systems Group
                   4059: at the Lawrence Berkeley Laboratory in Berkeley, CA.  Many thanks to all there
                   4060: for the support I received.
                   4061: .PP
                   4062: Send comments to vern@ee.lbl.gov.