[BACK]Return to flex.1 CVS log [TXT][DIR] Up to [local] / src / usr.bin / lex

Annotation of src/usr.bin/lex/flex.1, Revision 1.16

1.16    ! jmc         1: .\"    $OpenBSD: flex.1,v 1.15 2003/10/07 19:41:31 tedu Exp $
        !             2: .\"
1.12      jmc         3: .\" Copyright (c) 1990 The Regents of the University of California.
                      4: .\" All rights reserved.
1.2       deraadt     5: .\"
1.12      jmc         6: .\" This code is derived from software contributed to Berkeley by
                      7: .\" Vern Paxson.
                      8: .\"
                      9: .\" The United States Government has rights in this work pursuant
                     10: .\" to contract no. DE-AC03-76SF00098 between the United States
                     11: .\" Department of Energy and the University of California.
                     12: .\"
                     13: .\" Redistribution and use in source and binary forms, with or without
1.13      millert    14: .\" modification, are permitted provided that the following conditions
                     15: .\" are met:
                     16: .\"
                     17: .\" 1. Redistributions of source code must retain the above copyright
                     18: .\"    notice, this list of conditions and the following disclaimer.
                     19: .\" 2. Redistributions in binary form must reproduce the above copyright
                     20: .\"    notice, this list of conditions and the following disclaimer in the
                     21: .\"    documentation and/or other materials provided with the distribution.
                     22: .\"
                     23: .\" Neither the name of the University nor the names of its contributors
                     24: .\" may be used to endorse or promote products derived from this software
                     25: .\" without specific prior written permission.
                     26: .\"
                     27: .\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
                     28: .\" IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
                     29: .\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
                     30: .\" PURPOSE.
1.16    ! jmc        31: .\"
        !            32: .Dd April 1, 1995
        !            33: .Dt FLEX 1
        !            34: .Os
        !            35: .Sh NAME
        !            36: .Nm flex
        !            37: .Nd fast lexical analyzer generator
        !            38: .Sh SYNOPSIS
        !            39: .Nm
        !            40: .Op Fl 78BbcdFfhIiLlnpsTtVvw+?
        !            41: .Op Fl C Ns Op Cm aeFfmr
        !            42: .Op Fl Fl help
        !            43: .Op Fl Fl version
        !            44: .Sm off
        !            45: .Op Fl o Ar output
        !            46: .Op Fl P Ar prefix
        !            47: .Op Fl S Ar skeleton
        !            48: .Op Ar filename ...
        !            49: .Sm on
        !            50: .Sh OVERVIEW
1.1       deraadt    51: This manual describes
1.16    ! jmc        52: .Nm ,
        !            53: a tool for generating programs that perform pattern-matching on text.
        !            54: The manual includes both tutorial and reference sections:
        !            55: .Bl -ohang
        !            56: .It Sy Description
        !            57: A brief overview of the tool.
        !            58: .It Sy Some Simple Examples
        !            59: .It Sy Format of the Input File
        !            60: .It Sy Patterns
        !            61: The extended regular expressions used by
        !            62: .Nm .
        !            63: .It Sy How the Input is Matched
        !            64: The rules for determining what has been matched.
        !            65: .It Sy Actions
        !            66: How to specify what to do when a pattern is matched.
        !            67: .It Sy The Generated Scanner
        !            68: Details regarding the scanner that
        !            69: .Nm
        !            70: produces;
        !            71: how to control the input source.
        !            72: .It Sy Start Conditions
        !            73: Introducing context into scanners, and managing
        !            74: .Qq mini-scanners .
        !            75: .It Sy Multiple Input Buffers
        !            76: How to manipulate multiple input sources;
        !            77: how to scan from strings instead of files.
        !            78: .It Sy End-of-File Rules
        !            79: Special rules for matching the end of the input.
        !            80: .It Sy Miscellaneous Macros
        !            81: A summary of macros available to the actions.
        !            82: .It Sy Values Available to the User
        !            83: A summary of values available to the actions.
        !            84: .It Sy Interfacing with Yacc
        !            85: Connecting flex scanners together with
        !            86: .Xr yacc 1
        !            87: parsers.
        !            88: .It Sy Options
        !            89: .Nm
        !            90: command-line options, and the
        !            91: .Dq %option
        !            92: directive.
        !            93: .It Sy Performance Considerations
        !            94: How to make scanners go as fast as possible.
        !            95: .It Sy Generating C++ Scanners
        !            96: The
        !            97: .Pq experimental
        !            98: facility for generating C++ scanner classes.
        !            99: .It Sy Incompatibilities with Lex and POSIX
        !           100: How
        !           101: .Nm
        !           102: differs from AT&T lex and the
        !           103: .Tn POSIX
        !           104: lex standard.
        !           105: .It Sy Files
        !           106: Files used by
        !           107: .Nm .
        !           108: .It Sy Diagnostics
        !           109: Those error messages produced by
        !           110: .Nm
        !           111: .Pq or scanners it generates
        !           112: whose meanings might not be apparent.
        !           113: .It Sy See Also
        !           114: Other documentation, related tools.
        !           115: .It Sy Authors
        !           116: Includes contact information.
        !           117: .It Sy Bugs
        !           118: Known problems with
        !           119: .Nm .
        !           120: .El
        !           121: .Sh DESCRIPTION
        !           122: .Nm
1.1       deraadt   123: is a tool for generating
1.16    ! jmc       124: .Em scanners :
1.9       millert   125: programs which recognize lexical patterns in text.
1.16    ! jmc       126: .Nm
        !           127: reads the given input files, or its standard input if no file names are given,
        !           128: for a description of a scanner to generate.
        !           129: The description is in the form of pairs of regular expressions and C code,
        !           130: called
        !           131: .Em rules .
        !           132: .Nm
1.1       deraadt   133: generates as output a C source file,
1.16    ! jmc       134: .Pa lex.yy.c ,
1.1       deraadt   135: which defines a routine
1.16    ! jmc       136: .Fn yylex .
1.1       deraadt   137: This file is compiled and linked with the
1.16    ! jmc       138: .Fl lfl
        !           139: library to produce an executable.
        !           140: When the executable is run, it analyzes its input for occurrences
        !           141: of the regular expressions.
        !           142: Whenever it finds one, it executes the corresponding C code.
        !           143: .Sh SOME SIMPLE EXAMPLES
1.1       deraadt   144: First some simple examples to get the flavor of how one uses
1.16    ! jmc       145: .Nm .
1.1       deraadt   146: The following
1.16    ! jmc       147: .Nm
1.1       deraadt   148: input specifies a scanner which whenever it encounters the string
1.16    ! jmc       149: .Qq username
        !           150: will replace it with the user's login name:
        !           151: .Bd -literal -offset indent
        !           152: %%
        !           153: username    printf("%s", getlogin());
        !           154: .Ed
        !           155: .Pp
1.1       deraadt   156: By default, any text not matched by a
1.16    ! jmc       157: .Nm
        !           158: scanner is copied to the output, so the net effect of this scanner is
        !           159: to copy its input file to its output with each occurrence of
        !           160: .Qq username
        !           161: expanded.
        !           162: In this input, there is just one rule.
        !           163: .Qq username
        !           164: is the
        !           165: .Em pattern
        !           166: and the
        !           167: .Qq printf
        !           168: is the
        !           169: .Em action .
        !           170: The
        !           171: .Qq %%
        !           172: marks the beginning of the rules.
        !           173: .Pp
1.1       deraadt   174: Here's another simple example:
1.16    ! jmc       175: .Bd -literal -offset indent
        !           176: int num_lines = 0, num_chars = 0;
1.1       deraadt   177:
1.16    ! jmc       178: %%
        !           179: \en      ++num_lines; ++num_chars;
        !           180: \&.       ++num_chars;
        !           181:
        !           182: %%
        !           183: main()
        !           184: {
        !           185:        yylex();
        !           186:        printf("# of lines = %d, # of chars = %d\en",
        !           187:             num_lines, num_chars);
        !           188: }
        !           189: .Ed
        !           190: .Pp
1.1       deraadt   191: This scanner counts the number of characters and the number
1.16    ! jmc       192: of lines in its input
        !           193: (it produces no output other than the final report on the counts).
        !           194: The first line declares two globals,
        !           195: .Qq num_lines
        !           196: and
        !           197: .Qq num_chars ,
        !           198: which are accessible both inside
        !           199: .Fn yylex
1.1       deraadt   200: and in the
1.16    ! jmc       201: .Fn main
        !           202: routine declared after the second
        !           203: .Qq %% .
        !           204: There are two rules, one which matches a newline
        !           205: .Pq \&"\en\&"
        !           206: and increments both the line count and the character count,
        !           207: and one which matches any character other than a newline
        !           208: (indicated by the
        !           209: .Qq \&.
        !           210: regular expression).
        !           211: .Pp
1.1       deraadt   212: A somewhat more complicated example:
1.16    ! jmc       213: .Bd -literal -offset indent
        !           214: /* scanner for a toy Pascal-like language */
1.1       deraadt   215:
1.16    ! jmc       216: %{
        !           217: /* need this for the call to atof() below */
        !           218: #include <math.h>
        !           219: %}
1.1       deraadt   220:
1.16    ! jmc       221: DIGIT    [0-9]
        !           222: ID       [a-z][a-z0-9]*
1.1       deraadt   223:
1.16    ! jmc       224: %%
1.1       deraadt   225:
1.16    ! jmc       226: {DIGIT}+ {
        !           227:         printf("An integer: %s (%d)\en", yytext,
        !           228:             atoi(yytext));
        !           229: }
1.1       deraadt   230:
1.16    ! jmc       231: {DIGIT}+"."{DIGIT}* {
        !           232:         printf("A float: %s (%g)\en", yytext,
        !           233:             atof(yytext));
        !           234: }
1.1       deraadt   235:
1.16    ! jmc       236: if|then|begin|end|procedure|function {
        !           237:         printf("A keyword: %s\en", yytext);
        !           238: }
1.1       deraadt   239:
1.16    ! jmc       240: {ID}    printf("An identifier: %s\en", yytext);
1.1       deraadt   241:
1.16    ! jmc       242: "+"|"-"|"*"|"/"   printf("An operator: %s\en", yytext);
1.1       deraadt   243:
1.16    ! jmc       244: "{"[^}\en]*"}"     /* eat up one-line comments */
1.1       deraadt   245:
1.16    ! jmc       246: [ \et\en]+          /* eat up whitespace */
1.1       deraadt   247:
1.16    ! jmc       248: \&.       printf("Unrecognized character: %s\en", yytext);
1.1       deraadt   249:
1.16    ! jmc       250: %%
1.1       deraadt   251:
1.16    ! jmc       252: main(int argc, char *argv[])
        !           253: {
        !           254:         ++argv; --argc;  /* skip over program name */
        !           255:         if (argc > 0)
        !           256:                 yyin = fopen(argv[0], "r");
1.1       deraadt   257:         else
                    258:                 yyin = stdin;
1.7       aaron     259:
1.1       deraadt   260:         yylex();
1.16    ! jmc       261: }
        !           262: .Ed
        !           263: .Pp
        !           264: This is the beginnings of a simple scanner for a language like Pascal.
        !           265: It identifies different types of
        !           266: .Em tokens
1.1       deraadt   267: and reports on what it has seen.
1.16    ! jmc       268: .Pp
        !           269: The details of this example will be explained in the following sections.
        !           270: .Sh FORMAT OF THE INPUT FILE
1.1       deraadt   271: The
1.16    ! jmc       272: .Nm
1.1       deraadt   273: input file consists of three sections, separated by a line with just
1.16    ! jmc       274: .Qq %%
1.1       deraadt   275: in it:
1.16    ! jmc       276: .Bd -unfilled -offset indent
        !           277: definitions
        !           278: %%
        !           279: rules
        !           280: %%
        !           281: user code
        !           282: .Ed
        !           283: .Pp
1.1       deraadt   284: The
1.16    ! jmc       285: .Em definitions
1.1       deraadt   286: section contains declarations of simple
1.16    ! jmc       287: .Em name
1.1       deraadt   288: definitions to simplify the scanner specification, and declarations of
1.16    ! jmc       289: .Em start conditions ,
1.1       deraadt   290: which are explained in a later section.
1.16    ! jmc       291: .Pp
1.1       deraadt   292: Name definitions have the form:
1.16    ! jmc       293: .Pp
        !           294: .D1 name definition
        !           295: .Pp
        !           296: The
        !           297: .Qq name
        !           298: is a word beginning with a letter or an underscore
        !           299: .Pq Sq _
        !           300: followed by zero or more letters, digits,
        !           301: .Sq _ ,
        !           302: or
        !           303: .Sq -
        !           304: .Pq dash .
1.8       aaron     305: The definition is taken to begin at the first non-whitespace character
1.1       deraadt   306: following the name and continuing to the end of the line.
1.16    ! jmc       307: The definition can subsequently be referred to using
        !           308: .Qq {name} ,
        !           309: which will expand to
        !           310: .Qq (definition) .
        !           311: For example:
        !           312: .Bd -literal -offset indent
        !           313: DIGIT    [0-9]
        !           314: ID       [a-z][a-z0-9]*
        !           315: .Ed
        !           316: .Pp
        !           317: This defines
        !           318: .Qq DIGIT
        !           319: to be a regular expression which matches a single digit, and
        !           320: .Qq ID
        !           321: to be a regular expression which matches a letter
1.1       deraadt   322: followed by zero-or-more letters-or-digits.
                    323: A subsequent reference to
1.16    ! jmc       324: .Pp
        !           325: .Dl {DIGIT}+"."{DIGIT}*
        !           326: .Pp
1.1       deraadt   327: is identical to
1.16    ! jmc       328: .Pp
        !           329: .Dl ([0-9])+"."([0-9])*
        !           330: .Pp
        !           331: and matches one-or-more digits followed by a
        !           332: .Sq .\&
        !           333: followed by zero-or-more digits.
        !           334: .Pp
1.1       deraadt   335: The
1.16    ! jmc       336: .Em rules
1.1       deraadt   337: section of the
1.16    ! jmc       338: .Nm
1.1       deraadt   339: input contains a series of rules of the form:
1.16    ! jmc       340: .Pp
        !           341: .D1 pattern    action
        !           342: .Pp
        !           343: The pattern must be unindented and the action must begin
1.1       deraadt   344: on the same line.
1.16    ! jmc       345: .Pp
1.1       deraadt   346: See below for a further description of patterns and actions.
1.16    ! jmc       347: .Pp
1.1       deraadt   348: Finally, the user code section is simply copied to
1.16    ! jmc       349: .Pa lex.yy.c
1.1       deraadt   350: verbatim.
1.16    ! jmc       351: It is used for companion routines which call or are called by the scanner.
        !           352: The presence of this section is optional;
1.1       deraadt   353: if it is missing, the second
1.16    ! jmc       354: .Qq %%
        !           355: in the input file may be skipped too.
        !           356: .Pp
        !           357: In the definitions and rules sections, any indented text or text enclosed in
        !           358: .Sq %{
1.1       deraadt   359: and
1.16    ! jmc       360: .Sq %}
        !           361: is copied verbatim to the output
        !           362: .Pq with the %{}'s removed .
1.1       deraadt   363: The %{}'s must appear unindented on lines by themselves.
1.16    ! jmc       364: .Pp
1.1       deraadt   365: In the rules section,
1.16    ! jmc       366: any indented or %{} text appearing before the first rule may be used to
        !           367: declare variables which are local to the scanning routine and
        !           368: .Pq after the declarations
1.1       deraadt   369: code which is to be executed whenever the scanning routine is entered.
                    370: Other indented or %{} text in the rule section is still copied to the output,
                    371: but its meaning is not well-defined and it may well cause compile-time
                    372: errors (this feature is present for
1.16    ! jmc       373: .Tn POSIX
1.1       deraadt   374: compliance; see below for other such features).
1.16    ! jmc       375: .Pp
        !           376: In the definitions section
        !           377: .Pq but not in the rules section ,
        !           378: an unindented comment
        !           379: (i.e., a line beginning with
        !           380: .Qq /* )
        !           381: is also copied verbatim to the output up to the next
        !           382: .Qq */ .
        !           383: .Sh PATTERNS
1.1       deraadt   384: The patterns in the input are written using an extended set of regular
1.16    ! jmc       385: expressions.
        !           386: These are:
        !           387: .Bl -tag -width "XXXXXXXX"
        !           388: .It x
        !           389: Match the character
        !           390: .Sq x .
        !           391: .It .\&
        !           392: Any character
        !           393: .Pq byte
        !           394: except newline.
        !           395: .It [xyz]
        !           396: A
        !           397: .Qq character class ;
        !           398: in this case, the pattern matches either an
        !           399: .Sq x ,
        !           400: a
        !           401: .Sq y ,
        !           402: or a
        !           403: .Sq z .
        !           404: .It [abj-oZ]
        !           405: A
        !           406: .Qq character class
        !           407: with a range in it; matches an
        !           408: .Sq a ,
        !           409: a
        !           410: .Sq b ,
        !           411: any letter from
        !           412: .Sq j
        !           413: through
        !           414: .Sq o ,
        !           415: or a
        !           416: .Sq Z .
        !           417: .It [^A-Z]
        !           418: A
        !           419: .Qq negated character class ,
        !           420: i.e., any character but those in the class.
        !           421: In this case, any character EXCEPT an uppercase letter.
        !           422: .It [^A-Z\en]
        !           423: Any character EXCEPT an uppercase letter or a newline.
        !           424: .It r*
        !           425: Zero or more r's, where
        !           426: .Sq r
        !           427: is any regular expression.
        !           428: .It r+
        !           429: One or more r's.
        !           430: .It r?
        !           431: Zero or one r's (that is,
        !           432: .Qq an optional r ) .
        !           433: .It r{2,5}
        !           434: Anywhere from two to five r's.
        !           435: .It r{2,}
        !           436: Two or more r's.
        !           437: .It r{4}
        !           438: Exactly 4 r's.
        !           439: .It {name}
        !           440: The expansion of the
        !           441: .Qq name
        !           442: definition
        !           443: .Pq see above .
        !           444: .It \&"[xyz]\e\&"foo\&"
        !           445: The literal string: [xyz]"foo.
        !           446: .It \eX
        !           447: If
        !           448: .Sq X
        !           449: is an
        !           450: .Sq a ,
        !           451: .Sq b ,
        !           452: .Sq f ,
        !           453: .Sq n ,
        !           454: .Sq r ,
        !           455: .Sq t ,
        !           456: or
        !           457: .Sq v ,
        !           458: then the ANSI-C interpretation of
        !           459: .Sq \eX .
        !           460: Otherwise, a literal
        !           461: .Sq X
        !           462: (used to escape operators such as
        !           463: .Sq * ) .
        !           464: .It \e0
        !           465: A NUL character
        !           466: .Pq ASCII code 0 .
        !           467: .It \e123
        !           468: The character with octal value 123.
        !           469: .It \ex2a
        !           470: The character with hexadecimal value 2a.
        !           471: .It (r)
        !           472: Match an
        !           473: .Sq r ;
        !           474: parentheses are used to override precedence
        !           475: .Pq see below .
        !           476: .It rs
        !           477: The regular expression
        !           478: .Sq r
        !           479: followed by the regular expression
        !           480: .Sq s ;
        !           481: called
        !           482: .Qq concatenation .
        !           483: .It r|s
        !           484: Either an
        !           485: .Sq r
        !           486: or an
        !           487: .Sq s .
        !           488: .It r/s
        !           489: An
        !           490: .Sq r ,
        !           491: but only if it is followed by an
        !           492: .Sq s .
        !           493: The text matched by
        !           494: .Sq s
        !           495: is included when determining whether this rule is the
        !           496: .Qq longest match ,
        !           497: but is then returned to the input before the action is executed.
        !           498: So the action only sees the text matched by
        !           499: .Sq r .
        !           500: This type of pattern is called
        !           501: .Qq trailing context .
        !           502: (There are some combinations of r/s that
        !           503: .Nm
        !           504: cannot match correctly; see notes in the
        !           505: .Sx BUGS
        !           506: section below regarding
        !           507: .Qq dangerous trailing context . )
        !           508: .It ^r
        !           509: An
        !           510: .Sq r ,
        !           511: but only at the beginning of a line
        !           512: (i.e., just starting to scan, or right after a newline has been scanned).
        !           513: .It r$
        !           514: An
        !           515: .Sq r ,
        !           516: but only at the end of a line
        !           517: .Pq i.e., just before a newline .
        !           518: Equivalent to
        !           519: .Qq r/\en .
        !           520: .Pp
        !           521: Note that
        !           522: .Nm flex Ns 's
        !           523: notion of
        !           524: .Qq newline
        !           525: is exactly whatever the C compiler used to compile
        !           526: .Nm
        !           527: interprets
        !           528: .Sq \en
        !           529: as.
        !           530: .\" In particular, on some DOS systems you must either filter out \er's in the
        !           531: .\" input yourself, or explicitly use r/\er\en for
        !           532: .\" .Qq r$ .
        !           533: .It <s>r
        !           534: An
        !           535: .Sq r ,
        !           536: but only in start condition
        !           537: .Sq s
        !           538: .Pq see below for discussion of start conditions .
        !           539: .It <s1,s2,s3>r
        !           540: The same, but in any of start conditions s1, s2, or s3.
        !           541: .It <*>r
        !           542: An
        !           543: .Sq r
        !           544: in any start condition, even an exclusive one.
        !           545: .It <<EOF>>
        !           546: An end-of-file.
        !           547: .It <s1,s2><<EOF>>
        !           548: An end-of-file when in start condition s1 or s2.
        !           549: .El
        !           550: .Pp
1.1       deraadt   551: Note that inside of a character class, all regular expression operators
1.16    ! jmc       552: lose their special meaning except escape
        !           553: .Pq Sq \e
        !           554: and the character class operators,
        !           555: .Sq - ,
        !           556: .Sq ]\& ,
        !           557: and, at the beginning of the class,
        !           558: .Sq ^ .
        !           559: .Pp
1.1       deraadt   560: The regular expressions listed above are grouped according to
                    561: precedence, from highest precedence at the top to lowest at the bottom.
1.16    ! jmc       562: Those grouped together have equal precedence.
        !           563: For example,
        !           564: .Pp
        !           565: .D1 foo|bar*
        !           566: .Pp
1.1       deraadt   567: is the same as
1.16    ! jmc       568: .Pp
        !           569: .D1 (foo)|(ba(r*))
        !           570: .Pp
        !           571: since the
        !           572: .Sq *
        !           573: operator has higher precedence than concatenation,
        !           574: and concatenation higher than alternation
        !           575: .Pq Sq |\& .
        !           576: This pattern therefore matches
        !           577: .Em either
        !           578: the string
        !           579: .Qq foo
        !           580: .Em or
        !           581: the string
        !           582: .Qq ba
        !           583: followed by zero-or-more r's.
        !           584: To match
        !           585: .Qq foo
        !           586: or zero-or-more "bar"'s,
        !           587: use:
        !           588: .Pp
        !           589: .D1 foo|(bar)*
        !           590: .Pp
1.1       deraadt   591: and to match zero-or-more "foo"'s-or-"bar"'s:
1.16    ! jmc       592: .Pp
        !           593: .D1 (foo|bar)*
        !           594: .Pp
1.1       deraadt   595: In addition to characters and ranges of characters, character classes
                    596: can also contain character class
1.16    ! jmc       597: .Em expressions .
1.1       deraadt   598: These are expressions enclosed inside
1.16    ! jmc       599: .Sq [:
        !           600: and
        !           601: .Sq :]
        !           602: delimiters (which themselves must appear between the
        !           603: .Sq [
1.1       deraadt   604: and
1.16    ! jmc       605: .Sq ]\&
        !           606: of the
1.1       deraadt   607: character class; other elements may occur inside the character class, too).
                    608: The valid expressions are:
1.16    ! jmc       609: .Bd -unfilled -offset indent
        !           610: [:alnum:] [:alpha:] [:blank:]
        !           611: [:cntrl:] [:digit:] [:graph:]
        !           612: [:lower:] [:print:] [:punct:]
        !           613: [:space:] [:upper:] [:xdigit:]
        !           614: .Ed
        !           615: .Pp
1.1       deraadt   616: These expressions all designate a set of characters equivalent to
                    617: the corresponding standard C
1.16    ! jmc       618: .Fn isXXX
        !           619: function.
        !           620: For example, [:alnum:] designates those characters for which
        !           621: .Xr isalnum 3
        !           622: returns true \- i.e., any alphabetic or numeric.
1.1       deraadt   623: Some systems don't provide
1.16    ! jmc       624: .Xr isblank 3 ,
        !           625: so
        !           626: .Nm
        !           627: defines [:blank:] as a blank or a tab.
        !           628: .Pp
1.1       deraadt   629: For example, the following character classes are all equivalent:
1.16    ! jmc       630: .Bd -unfilled -offset indent
        !           631: [[:alnum:]]
        !           632: [[:alpha:][:digit:]]
        !           633: [[:alpha:]0-9]
        !           634: [a-zA-Z0-9]
        !           635: .Ed
        !           636: .Pp
        !           637: If the scanner is case-insensitive (the
        !           638: .Fl i
        !           639: flag), then [:upper:] and [:lower:] are equivalent to [:alpha:].
        !           640: .Pp
1.1       deraadt   641: Some notes on patterns:
1.16    ! jmc       642: .Bl -dash
        !           643: .It
        !           644: A negated character class such as the example
        !           645: .Qq [^A-Z]
        !           646: above will match a newline unless "\en"
        !           647: .Pq or an equivalent escape sequence
        !           648: is one of the characters explicitly present in the negated character class
        !           649: (e.g.,
        !           650: .Qq [^A-Z\en] ) .
        !           651: This is unlike how many other regular expression tools treat negated character
        !           652: classes, but unfortunately the inconsistency is historically entrenched.
        !           653: Matching newlines means that a pattern like
        !           654: .Qq [^"]*
        !           655: can match the entire input unless there's another quote in the input.
        !           656: .It
        !           657: A rule can have at most one instance of trailing context
        !           658: (the
        !           659: .Sq /
        !           660: operator or the
        !           661: .Sq $
        !           662: operator).
        !           663: The start condition,
        !           664: .Sq ^ ,
        !           665: and
        !           666: .Qq <<EOF>>
        !           667: patterns can only occur at the beginning of a pattern, and, as well as with
        !           668: .Sq /
        !           669: and
        !           670: .Sq $ ,
        !           671: cannot be grouped inside parentheses.
        !           672: A
        !           673: .Sq ^
        !           674: which does not occur at the beginning of a rule or a
        !           675: .Sq $
        !           676: which does not occur at the end of a rule loses its special properties
        !           677: and is treated as a normal character.
        !           678: .It
1.1       deraadt   679: The following are illegal:
1.16    ! jmc       680: .Bd -unfilled -offset indent
        !           681: foo/bar$
        !           682: <sc1>foo<sc2>bar
        !           683: .Ed
        !           684: .Pp
        !           685: Note that the first of these, can be written
        !           686: .Qq foo/bar\en .
        !           687: .It
        !           688: The following will result in
        !           689: .Sq $
        !           690: or
        !           691: .Sq ^
        !           692: being treated as a normal character:
        !           693: .Bd -unfilled -offset indent
        !           694: foo|(bar$)
        !           695: foo|^bar
        !           696: .Ed
        !           697: .Pp
        !           698: If what's wanted is a
        !           699: .Qq foo
        !           700: or a bar-followed-by-a-newline, the following could be used
        !           701: (the special
        !           702: .Sq |\&
        !           703: action is explained below):
        !           704: .Bd -unfilled -offset indent
        !           705: foo      |
        !           706: bar$     /* action goes here */
        !           707: .Ed
        !           708: .Pp
1.1       deraadt   709: A similar trick will work for matching a foo or a
                    710: bar-at-the-beginning-of-a-line.
1.16    ! jmc       711: .El
        !           712: .Sh HOW THE INPUT IS MATCHED
        !           713: When the generated scanner is run,
        !           714: it analyzes its input looking for strings which match any of its patterns.
        !           715: If it finds more than one match,
        !           716: it takes the one matching the most text
        !           717: (for trailing context rules, this includes the length of the trailing part,
        !           718: even though it will then be returned to the input).
        !           719: If it finds two or more matches of the same length,
        !           720: the rule listed first in the
        !           721: .Nm
1.1       deraadt   722: input file is chosen.
1.16    ! jmc       723: .Pp
1.1       deraadt   724: Once the match is determined, the text corresponding to the match
                    725: (called the
1.16    ! jmc       726: .Em token )
1.1       deraadt   727: is made available in the global character pointer
1.16    ! jmc       728: .Fa yytext ,
1.1       deraadt   729: and its length in the global integer
1.16    ! jmc       730: .Fa yyleng .
1.1       deraadt   731: The
1.16    ! jmc       732: .Em action
        !           733: corresponding to the matched pattern is then executed
        !           734: .Pq a more detailed description of actions follows ,
        !           735: and then the remaining input is scanned for another match.
        !           736: .Pp
        !           737: If no match is found, then the default rule is executed:
        !           738: the next character in the input is considered matched and
        !           739: copied to the standard output.
        !           740: Thus, the simplest legal
        !           741: .Nm
1.1       deraadt   742: input is:
1.16    ! jmc       743: .Pp
        !           744: .D1 %%
        !           745: .Pp
        !           746: which generates a scanner that simply copies its input
        !           747: .Pq one character at a time
        !           748: to its output.
        !           749: .Pp
1.1       deraadt   750: Note that
1.16    ! jmc       751: .Fa yytext
        !           752: can be defined in two different ways:
        !           753: either as a character pointer or as a character array.
        !           754: Which definition
        !           755: .Nm
        !           756: uses can be controlled by including one of the special directives
        !           757: .Dq %pointer
        !           758: or
        !           759: .Dq %array
        !           760: in the first
        !           761: .Pq definitions
        !           762: section of flex input.
        !           763: The default is
        !           764: .Dq %pointer ,
        !           765: unless the
        !           766: .Fl l
        !           767: lex compatibility option is used, in which case
        !           768: .Fa yytext
1.1       deraadt   769: will be an array.
                    770: The advantage of using
1.16    ! jmc       771: .Dq %pointer
1.1       deraadt   772: is substantially faster scanning and no buffer overflow when matching
1.16    ! jmc       773: very large tokens
        !           774: .Pq unless not enough dynamic memory is available .
        !           775: The disadvantage is that actions are restricted in how they can modify
        !           776: .Fa yytext
        !           777: .Pq see the next section ,
        !           778: and calls to the
        !           779: .Fn unput
1.10      deraadt   780: function destroy the present contents of
1.16    ! jmc       781: .Fa yytext ,
1.1       deraadt   782: which can be a considerable porting headache when moving between different
1.16    ! jmc       783: .Nm lex
1.1       deraadt   784: versions.
1.16    ! jmc       785: .Pp
1.1       deraadt   786: The advantage of
1.16    ! jmc       787: .Dq %array
        !           788: is that
        !           789: .Fa yytext
        !           790: can be modified as much as wanted, and calls to
        !           791: .Fn unput
1.1       deraadt   792: do not destroy
1.16    ! jmc       793: .Fa yytext
        !           794: .Pq see below .
        !           795: Furthermore, existing
        !           796: .Nm lex
1.1       deraadt   797: programs sometimes access
1.16    ! jmc       798: .Fa yytext
1.1       deraadt   799: externally using declarations of the form:
1.16    ! jmc       800: .Pp
        !           801: .D1 extern char yytext[];
        !           802: .Pp
1.1       deraadt   803: This definition is erroneous when used with
1.16    ! jmc       804: .Dq %pointer ,
1.1       deraadt   805: but correct for
1.16    ! jmc       806: .Dq %array .
        !           807: .Pp
        !           808: .Dq %array
1.1       deraadt   809: defines
1.16    ! jmc       810: .Fa yytext
1.1       deraadt   811: to be an array of
1.16    ! jmc       812: .Dv YYLMAX
        !           813: characters, which defaults to a fairly large value.
        !           814: The size can be changed by simply #define'ing
        !           815: .Dv YYLMAX
        !           816: to a different value in the first section of
        !           817: .Nm
        !           818: input.
        !           819: As mentioned above, with
        !           820: .Dq %pointer
        !           821: yytext grows dynamically to accommodate large tokens.
        !           822: While this means a
        !           823: .Dq %pointer
        !           824: scanner can accommodate very large tokens
        !           825: .Pq such as matching entire blocks of comments ,
        !           826: bear in mind that each time the scanner must resize
        !           827: .Fa yytext
1.1       deraadt   828: it also must rescan the entire token from the beginning, so matching such
                    829: tokens can prove slow.
1.16    ! jmc       830: .Fa yytext
        !           831: presently does not dynamically grow if a call to
        !           832: .Fn unput
1.1       deraadt   833: results in too much text being pushed back; instead, a run-time error results.
1.16    ! jmc       834: .Pp
        !           835: Also note that
        !           836: .Dq %array
        !           837: cannot be used with C++ scanner classes
        !           838: .Pq the c++ option; see below .
        !           839: .Sh ACTIONS
        !           840: Each pattern in a rule has a corresponding action,
        !           841: which can be any arbitrary C statement.
        !           842: The pattern ends at the first non-escaped whitespace character;
        !           843: the remainder of the line is its action.
        !           844: If the action is empty,
        !           845: then when the pattern is matched the input token is simply discarded.
        !           846: For example, here is the specification for a program
        !           847: which deletes all occurrences of
        !           848: .Qq zap me
        !           849: from its input:
        !           850: .Bd -literal -offset indent
        !           851: %%
        !           852: "zap me"
        !           853: .Ed
        !           854: .Pp
1.1       deraadt   855: (It will copy all other characters in the input to the output since
                    856: they will be matched by the default rule.)
1.16    ! jmc       857: .Pp
1.1       deraadt   858: Here is a program which compresses multiple blanks and tabs down to
                    859: a single blank, and throws away whitespace found at the end of a line:
1.16    ! jmc       860: .Bd -literal -offset indent
        !           861: %%
        !           862: [ \et]+        putchar(' ');
        !           863: [ \et]+$       /* ignore this token */
        !           864: .Ed
        !           865: .Pp
        !           866: If the action contains a
        !           867: .Sq { ,
        !           868: then the action spans till the balancing
        !           869: .Sq }
1.1       deraadt   870: is found, and the action may cross multiple lines.
1.16    ! jmc       871: .Nm
1.1       deraadt   872: knows about C strings and comments and won't be fooled by braces found
                    873: within them, but also allows actions to begin with
1.16    ! jmc       874: .Sq %{
1.1       deraadt   875: and will consider the action to be all the text up to the next
1.16    ! jmc       876: .Sq %}
        !           877: .Pq regardless of ordinary braces inside the action .
        !           878: .Pp
        !           879: An action consisting solely of a vertical bar
        !           880: .Pq Sq |\&
        !           881: means
        !           882: .Qq same as the action for the next rule .
        !           883: See below for an illustration.
        !           884: .Pp
        !           885: Actions can include arbitrary C code,
        !           886: including return statements to return a value to whatever routine called
        !           887: .Fn yylex .
1.1       deraadt   888: Each time
1.16    ! jmc       889: .Fn yylex
        !           890: is called, it continues processing tokens from where it last left off
        !           891: until it either reaches the end of the file or executes a return.
        !           892: .Pp
1.1       deraadt   893: Actions are free to modify
1.16    ! jmc       894: .Fa yytext
        !           895: except for lengthening it
        !           896: (adding characters to its end \- these will overwrite later characters in the
        !           897: input stream).
        !           898: This, however, does not apply when using
        !           899: .Dq %array
        !           900: .Pq see above ;
        !           901: in that case,
        !           902: .Fa yytext
1.1       deraadt   903: may be freely modified in any way.
1.16    ! jmc       904: .Pp
1.1       deraadt   905: Actions are free to modify
1.16    ! jmc       906: .Fa yyleng
1.1       deraadt   907: except they should not do so if the action also includes use of
1.16    ! jmc       908: .Fn yymore
        !           909: .Pq see below .
        !           910: .Pp
1.1       deraadt   911: There are a number of special directives which can be included within
                    912: an action:
1.16    ! jmc       913: .Bl -tag -width Ds
        !           914: .It ECHO
        !           915: Copies
        !           916: .Fa yytext
        !           917: to the scanner's output.
        !           918: .It BEGIN
        !           919: Followed by the name of a start condition, places the scanner in the
        !           920: corresponding start condition
        !           921: .Pq see below .
        !           922: .It REJECT
        !           923: Directs the scanner to proceed on to the
        !           924: .Qq second best
        !           925: rule which matched the input
        !           926: .Pq or a prefix of the input .
        !           927: The rule is chosen as described above in
        !           928: .Sx HOW THE INPUT IS MATCHED ,
        !           929: and
        !           930: .Fa yytext
1.1       deraadt   931: and
1.16    ! jmc       932: .Fa yyleng
1.1       deraadt   933: set up appropriately.
                    934: It may either be one which matched as much text
                    935: as the originally chosen rule but came later in the
1.16    ! jmc       936: .Nm
1.1       deraadt   937: input file, or one which matched less text.
                    938: For example, the following will both count the
1.16    ! jmc       939: words in the input and call the routine
        !           940: .Fn special
        !           941: whenever
        !           942: .Qq frob
        !           943: is seen:
        !           944: .Bd -literal -offset indent
        !           945: int word_count = 0;
        !           946: %%
        !           947:
        !           948: frob        special(); REJECT;
        !           949: [^ \et\en]+   ++word_count;
        !           950: .Ed
        !           951: .Pp
1.1       deraadt   952: Without the
1.16    ! jmc       953: .Em REJECT ,
        !           954: any "frob"'s in the input would not be counted as words,
        !           955: since the scanner normally executes only one action per token.
1.1       deraadt   956: Multiple
1.16    ! jmc       957: .Em REJECT Ns 's
        !           958: are allowed,
        !           959: each one finding the next best choice to the currently active rule.
        !           960: For example, when the following scanner scans the token
        !           961: .Qq abcd ,
        !           962: it will write
        !           963: .Qq abcdabcaba
        !           964: to the output:
        !           965: .Bd -literal -offset indent
        !           966: %%
        !           967: a        |
        !           968: ab       |
        !           969: abc      |
        !           970: abcd     ECHO; REJECT;
        !           971: \&.|\en     /* eat up any unmatched character */
        !           972: .Ed
        !           973: .Pp
1.1       deraadt   974: (The first three rules share the fourth's action since they use
1.16    ! jmc       975: the special
        !           976: .Sq |\&
        !           977: action.)
        !           978: .Em REJECT
1.1       deraadt   979: is a particularly expensive feature in terms of scanner performance;
1.16    ! jmc       980: if it is used in any of the scanner's actions it will slow down
        !           981: all of the scanner's matching.
        !           982: Furthermore,
        !           983: .Em REJECT
1.1       deraadt   984: cannot be used with the
1.16    ! jmc       985: .Fl Cf
1.1       deraadt   986: or
1.16    ! jmc       987: .Fl CF
        !           988: options
        !           989: .Pq see below .
        !           990: .Pp
1.1       deraadt   991: Note also that unlike the other special actions,
1.16    ! jmc       992: .Em REJECT
1.1       deraadt   993: is a
1.16    ! jmc       994: .Em branch ;
        !           995: code immediately following it in the action will not be executed.
        !           996: .It yymore()
        !           997: Tells the scanner that the next time it matches a rule, the corresponding
        !           998: token should be appended onto the current value of
        !           999: .Fa yytext
        !          1000: rather than replacing it.
        !          1001: For example, given the input
        !          1002: .Qq mega-kludge
        !          1003: the following will write
        !          1004: .Qq mega-mega-kludge
        !          1005: to the output:
        !          1006: .Bd -literal -offset indent
        !          1007: %%
        !          1008: mega-    ECHO; yymore();
        !          1009: kludge   ECHO;
        !          1010: .Ed
        !          1011: .Pp
        !          1012: First
        !          1013: .Qq mega-
        !          1014: is matched and echoed to the output.
        !          1015: Then
        !          1016: .Qq kludge
        !          1017: is matched, but the previous
        !          1018: .Qq mega-
        !          1019: is still hanging around at the beginning of
        !          1020: .Fa yytext
1.1       deraadt  1021: so the
1.16    ! jmc      1022: .Em ECHO
        !          1023: for the
        !          1024: .Qq kludge
        !          1025: rule will actually write
        !          1026: .Qq mega-kludge .
        !          1027: .Pp
1.1       deraadt  1028: Two notes regarding use of
1.16    ! jmc      1029: .Fn yymore :
1.1       deraadt  1030: First,
1.16    ! jmc      1031: .Fn yymore
1.1       deraadt  1032: depends on the value of
1.16    ! jmc      1033: .Fa yyleng
        !          1034: correctly reflecting the size of the current token, so
        !          1035: .Fa yyleng
        !          1036: must not be modified when using
        !          1037: .Fn yymore .
1.1       deraadt  1038: Second, the presence of
1.16    ! jmc      1039: .Fn yymore
1.1       deraadt  1040: in the scanner's action entails a minor performance penalty in the
                   1041: scanner's matching speed.
1.16    ! jmc      1042: .It yyless(n)
        !          1043: Returns all but the first
        !          1044: .Ar n
1.1       deraadt  1045: characters of the current token back to the input stream, where they
                   1046: will be rescanned when the scanner looks for the next match.
1.16    ! jmc      1047: .Fa yytext
1.1       deraadt  1048: and
1.16    ! jmc      1049: .Fa yyleng
1.1       deraadt  1050: are adjusted appropriately (e.g.,
1.16    ! jmc      1051: .Fa yyleng
1.1       deraadt  1052: will now be equal to
1.16    ! jmc      1053: .Ar n ) .
        !          1054: For example, on the input
        !          1055: .Qq foobar
        !          1056: the following will write out
        !          1057: .Qq foobarbar :
        !          1058: .Bd -literal -offset indent
        !          1059: %%
        !          1060: foobar    ECHO; yyless(3);
        !          1061: [a-z]+    ECHO;
        !          1062: .Ed
        !          1063: .Pp
1.1       deraadt  1064: An argument of 0 to
1.16    ! jmc      1065: .Fa yyless
        !          1066: will cause the entire current input string to be scanned again.
        !          1067: Unless how the scanner will subsequently process its input has been changed
        !          1068: (using
        !          1069: .Em BEGIN ,
        !          1070: for example),
        !          1071: this will result in an endless loop.
        !          1072: .Pp
1.1       deraadt  1073: Note that
1.16    ! jmc      1074: .Fa yyless
        !          1075: is a macro and can only be used in the
        !          1076: .Nm
        !          1077: input file, not from other source files.
        !          1078: .It unput(c)
        !          1079: Puts the character
        !          1080: .Ar c
        !          1081: back into the input stream.
        !          1082: It will be the next character scanned.
1.1       deraadt  1083: The following action will take the current token and cause it
                   1084: to be rescanned enclosed in parentheses.
1.16    ! jmc      1085: .Bd -literal -offset indent
        !          1086: {
        !          1087:         int i;
        !          1088:         char *yycopy;
        !          1089:
        !          1090:         /* Copy yytext because unput() trashes yytext */
        !          1091:         if ((yycopy = strdup(yytext)) == NULL)
        !          1092:                 err(1, NULL);
        !          1093:         unput(')');
        !          1094:         for (i = yyleng - 1; i >= 0; --i)
        !          1095:                 unput(yycopy[i]);
        !          1096:         unput('(');
        !          1097:         free(yycopy);
        !          1098: }
        !          1099: .Ed
        !          1100: .Pp
1.1       deraadt  1101: Note that since each
1.16    ! jmc      1102: .Fn unput
        !          1103: puts the given character back at the beginning of the input stream,
        !          1104: pushing back strings must be done back-to-front.
        !          1105: .Pp
1.1       deraadt  1106: An important potential problem when using
1.16    ! jmc      1107: .Fn unput
        !          1108: is that if using
        !          1109: .Dq %pointer
        !          1110: .Pq the default ,
        !          1111: a call to
        !          1112: .Fn unput
        !          1113: destroys the contents of
        !          1114: .Fa yytext ,
1.1       deraadt  1115: starting with its rightmost character and devouring one character to
1.16    ! jmc      1116: the left with each call.
        !          1117: If the value of
        !          1118: .Fa yytext
        !          1119: should be preserved after a call to
        !          1120: .Fn unput
        !          1121: .Pq as in the above example ,
        !          1122: it must either first be copied elsewhere, or the scanner must be built using
        !          1123: .Dq %array
        !          1124: instead (see
        !          1125: .Sx HOW THE INPUT IS MATCHED ) .
        !          1126: .Pp
        !          1127: Finally, note that EOF cannot be put back
1.1       deraadt  1128: to attempt to mark the input stream with an end-of-file.
1.16    ! jmc      1129: .It input()
        !          1130: Reads the next character from the input stream.
        !          1131: For example, the following is one way to eat up C comments:
        !          1132: .Bd -literal -offset indent
        !          1133: %%
        !          1134: "/*" {
        !          1135:         int c;
        !          1136:
        !          1137:         for (;;) {
        !          1138:                 while ((c = input()) != '*' && c != EOF)
        !          1139:                         ; /* eat up text of comment */
        !          1140:
        !          1141:                 if (c == '*') {
        !          1142:                         while ((c = input()) == '*')
        !          1143:                                 ;
        !          1144:                         if (c == '/')
        !          1145:                                 break; /* found the end */
        !          1146:                 }
        !          1147:
        !          1148:                 if (c == EOF) {
        !          1149:                         errx(1, "EOF in comment");
1.1       deraadt  1150:                         break;
                   1151:                 }
1.16    ! jmc      1152:         }
        !          1153: }
        !          1154: .Ed
        !          1155: .Pp
        !          1156: (Note that if the scanner is compiled using C++, then
        !          1157: .Fn input
1.1       deraadt  1158: is instead referred to as
1.16    ! jmc      1159: .Fn yyinput ,
        !          1160: in order to avoid a name clash with the C++ stream by the name of input.)
        !          1161: .It YY_FLUSH_BUFFER
        !          1162: Flushes the scanner's internal buffer
        !          1163: so that the next time the scanner attempts to match a token,
        !          1164: it will first refill the buffer using
        !          1165: .Dv YY_INPUT
        !          1166: (see
        !          1167: .Sx THE GENERATED SCANNER ,
        !          1168: below).
        !          1169: This action is a special case of the more general
        !          1170: .Fn yy_flush_buffer
        !          1171: function, described below in the section
        !          1172: .Sx MULTIPLE INPUT BUFFERS .
        !          1173: .It yyterminate()
        !          1174: Can be used in lieu of a return statement in an action.
        !          1175: It terminates the scanner and returns a 0 to the scanner's caller, indicating
        !          1176: .Qq all done .
1.1       deraadt  1177: By default,
1.16    ! jmc      1178: .Fn yyterminate
        !          1179: is also called when an end-of-file is encountered.
        !          1180: It is a macro and may be redefined.
        !          1181: .El
        !          1182: .Sh THE GENERATED SCANNER
1.1       deraadt  1183: The output of
1.16    ! jmc      1184: .Nm
1.1       deraadt  1185: is the file
1.16    ! jmc      1186: .Pa lex.yy.c ,
1.1       deraadt  1187: which contains the scanning routine
1.16    ! jmc      1188: .Fn yylex ,
        !          1189: a number of tables used by it for matching tokens,
        !          1190: and a number of auxiliary routines and macros.
        !          1191: By default,
        !          1192: .Fn yylex
1.1       deraadt  1193: is declared as follows:
1.16    ! jmc      1194: .Bd -unfilled -offset indent
        !          1195: int yylex()
        !          1196: {
        !          1197:     ... various definitions and the actions in here ...
        !          1198: }
        !          1199: .Ed
        !          1200: .Pp
        !          1201: (If the environment supports function prototypes, then it will
        !          1202: be "int yylex(void)".)
        !          1203: This definition may be changed by defining the
        !          1204: .Dv YY_DECL
        !          1205: macro.
        !          1206: For example:
        !          1207: .Bd -literal -offset indent
        !          1208: #define YY_DECL float lexscan(a, b) float a, b;
        !          1209: .Ed
        !          1210: .Pp
        !          1211: would give the scanning routine the name
        !          1212: .Em lexscan ,
        !          1213: returning a float, and taking two floats as arguments.
        !          1214: Note that if arguments are given to the scanning routine using a
        !          1215: K&R-style/non-prototyped function declaration,
        !          1216: the definition must be terminated with a semi-colon
        !          1217: .Pq Sq ;\& .
        !          1218: .Pp
1.1       deraadt  1219: Whenever
1.16    ! jmc      1220: .Fn yylex
1.1       deraadt  1221: is called, it scans tokens from the global input file
1.16    ! jmc      1222: .Pa yyin
        !          1223: .Pq which defaults to stdin .
        !          1224: It continues until it either reaches an end-of-file
        !          1225: .Pq at which point it returns the value 0
        !          1226: or one of its actions executes a
        !          1227: .Em return
1.1       deraadt  1228: statement.
1.16    ! jmc      1229: .Pp
1.1       deraadt  1230: If the scanner reaches an end-of-file, subsequent calls are undefined
                   1231: unless either
1.16    ! jmc      1232: .Em yyin
        !          1233: is pointed at a new input file
        !          1234: .Pq in which case scanning continues from that file ,
        !          1235: or
        !          1236: .Fn yyrestart
1.1       deraadt  1237: is called.
1.16    ! jmc      1238: .Fn yyrestart
1.1       deraadt  1239: takes one argument, a
1.16    ! jmc      1240: .Fa FILE *
        !          1241: pointer (which can be nil, if
        !          1242: .Dv YY_INPUT
        !          1243: has been set up to scan from a source other than
        !          1244: .Em yyin ) ,
1.1       deraadt  1245: and initializes
1.16    ! jmc      1246: .Em yyin
        !          1247: for scanning from that file.
        !          1248: Essentially there is no difference between just assigning
        !          1249: .Em yyin
1.1       deraadt  1250: to a new input file or using
1.16    ! jmc      1251: .Fn yyrestart
        !          1252: to do so; the latter is available for compatibility with previous versions of
        !          1253: .Nm ,
1.1       deraadt  1254: and because it can be used to switch input files in the middle of scanning.
1.16    ! jmc      1255: It can also be used to throw away the current input buffer,
        !          1256: by calling it with an argument of
        !          1257: .Em yyin ;
1.1       deraadt  1258: but better is to use
1.16    ! jmc      1259: .Dv YY_FLUSH_BUFFER
        !          1260: .Pq see above .
1.1       deraadt  1261: Note that
1.16    ! jmc      1262: .Fn yyrestart
        !          1263: does not reset the start condition to
        !          1264: .Em INITIAL
        !          1265: (see
        !          1266: .Sx START CONDITIONS ,
        !          1267: below).
        !          1268: .Pp
1.1       deraadt  1269: If
1.16    ! jmc      1270: .Fn yylex
1.1       deraadt  1271: stops scanning due to executing a
1.16    ! jmc      1272: .Em return
1.1       deraadt  1273: statement in one of the actions, the scanner may then be called again and it
                   1274: will resume scanning where it left off.
1.16    ! jmc      1275: .Pp
        !          1276: By default
        !          1277: .Pq and for purposes of efficiency ,
        !          1278: the scanner uses block-reads rather than simple
        !          1279: .Xr getc 3
1.1       deraadt  1280: calls to read characters from
1.16    ! jmc      1281: .Em yyin .
1.1       deraadt  1282: The nature of how it gets its input can be controlled by defining the
1.16    ! jmc      1283: .Dv YY_INPUT
1.1       deraadt  1284: macro.
1.16    ! jmc      1285: .Dv YY_INPUT Ns 's
        !          1286: calling sequence is
        !          1287: .Qq YY_INPUT(buf,result,max_size) .
        !          1288: Its action is to place up to
        !          1289: .Dv max_size
1.1       deraadt  1290: characters in the character array
1.16    ! jmc      1291: .Em buf
1.1       deraadt  1292: and return in the integer variable
1.16    ! jmc      1293: .Em result
        !          1294: either the number of characters read or the constant
        !          1295: .Dv YY_NULL
        !          1296: (0 on
        !          1297: .Ux
        !          1298: systems)
        !          1299: to indicate
        !          1300: .Dv EOF .
        !          1301: The default
        !          1302: .Dv YY_INPUT
        !          1303: reads from the global file-pointer
        !          1304: .Qq yyin .
        !          1305: .Pp
        !          1306: A sample definition of
        !          1307: .Dv YY_INPUT
        !          1308: .Pq in the definitions section of the input file :
        !          1309: .Bd -unfilled -offset indent
        !          1310: %{
        !          1311: #define YY_INPUT(buf,result,max_size) \e
        !          1312: { \e
        !          1313:         int c = getchar(); \e
        !          1314:         result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \e
        !          1315: }
        !          1316: %}
        !          1317: .Ed
        !          1318: .Pp
1.1       deraadt  1319: This definition will change the input processing to occur
                   1320: one character at a time.
1.16    ! jmc      1321: .Pp
        !          1322: When the scanner receives an end-of-file indication from
        !          1323: .Dv YY_INPUT ,
1.1       deraadt  1324: it then checks the
1.16    ! jmc      1325: .Fn yywrap
        !          1326: function.
        !          1327: If
        !          1328: .Fn yywrap
        !          1329: returns false
        !          1330: .Pq zero ,
        !          1331: then it is assumed that the function has gone ahead and set up
        !          1332: .Em yyin
        !          1333: to point to another input file, and scanning continues.
        !          1334: If it returns true
        !          1335: .Pq non-zero ,
        !          1336: then the scanner terminates, returning 0 to its caller.
        !          1337: Note that in either case, the start condition remains unchanged;
        !          1338: it does not revert to
        !          1339: .Em INITIAL .
        !          1340: .Pp
1.1       deraadt  1341: If you do not supply your own version of
1.16    ! jmc      1342: .Fn yywrap ,
1.1       deraadt  1343: then you must either use
1.16    ! jmc      1344: .Dq %option noyywrap
1.1       deraadt  1345: (in which case the scanner behaves as though
1.16    ! jmc      1346: .Fn yywrap
1.1       deraadt  1347: returned 1), or you must link with
1.16    ! jmc      1348: .Fl lfl
1.1       deraadt  1349: to obtain the default version of the routine, which always returns 1.
1.16    ! jmc      1350: .Pp
1.1       deraadt  1351: Three routines are available for scanning from in-memory buffers rather
                   1352: than files:
1.16    ! jmc      1353: .Fn yy_scan_string ,
        !          1354: .Fn yy_scan_bytes ,
1.1       deraadt  1355: and
1.16    ! jmc      1356: .Fn yy_scan_buffer .
        !          1357: See the discussion of them below in the section
        !          1358: .Sx MULTIPLE INPUT BUFFERS .
        !          1359: .Pp
1.1       deraadt  1360: The scanner writes its
1.16    ! jmc      1361: .Em ECHO
1.1       deraadt  1362: output to the
1.16    ! jmc      1363: .Em yyout
        !          1364: global
        !          1365: .Pq default, stdout ,
        !          1366: which may be redefined by the user simply by assigning it to some other
        !          1367: .Va FILE
1.1       deraadt  1368: pointer.
1.16    ! jmc      1369: .Sh START CONDITIONS
        !          1370: .Nm
        !          1371: provides a mechanism for conditionally activating rules.
        !          1372: Any rule whose pattern is prefixed with
        !          1373: .Qq Aq sc
        !          1374: will only be active when the scanner is in the start condition named
        !          1375: .Qq sc .
        !          1376: For example,
        !          1377: .Bd -literal -offset indent
        !          1378: <STRING>[^"]* { /* eat up the string body ... */
        !          1379:         ...
        !          1380: }
        !          1381: .Ed
        !          1382: .Pp
        !          1383: will be active only when the scanner is in the
        !          1384: .Qq STRING
        !          1385: start condition, and
        !          1386: .Bd -literal -offset indent
        !          1387: <INITIAL,STRING,QUOTE>\e. { /* handle an escape ... */
        !          1388:         ...
        !          1389: }
        !          1390: .Ed
        !          1391: .Pp
        !          1392: will be active only when the current start condition is either
        !          1393: .Qq INITIAL ,
        !          1394: .Qq STRING ,
        !          1395: or
        !          1396: .Qq QUOTE .
        !          1397: .Pp
        !          1398: Start conditions are declared in the definitions
        !          1399: .Pq first
        !          1400: section of the input using unindented lines beginning with either
        !          1401: .Sq %s
1.1       deraadt  1402: or
1.16    ! jmc      1403: .Sq %x
1.1       deraadt  1404: followed by a list of names.
                   1405: The former declares
1.16    ! jmc      1406: .Em inclusive
1.1       deraadt  1407: start conditions, the latter
1.16    ! jmc      1408: .Em exclusive
        !          1409: start conditions.
        !          1410: A start condition is activated using the
        !          1411: .Em BEGIN
        !          1412: action.
        !          1413: Until the next
        !          1414: .Em BEGIN
        !          1415: action is executed, rules with the given start condition will be active and
1.1       deraadt  1416: rules with other start conditions will be inactive.
1.16    ! jmc      1417: If the start condition is inclusive,
1.1       deraadt  1418: then rules with no start conditions at all will also be active.
1.16    ! jmc      1419: If it is exclusive,
        !          1420: then only rules qualified with the start condition will be active.
1.1       deraadt  1421: A set of rules contingent on the same exclusive start condition
                   1422: describe a scanner which is independent of any of the other rules in the
1.16    ! jmc      1423: .Nm
        !          1424: input.
        !          1425: Because of this, exclusive start conditions make it easy to specify
        !          1426: .Qq mini-scanners
1.1       deraadt  1427: which scan portions of the input that are syntactically different
1.16    ! jmc      1428: from the rest
        !          1429: .Pq e.g., comments .
        !          1430: .Pp
1.1       deraadt  1431: If the distinction between inclusive and exclusive start conditions
                   1432: is still a little vague, here's a simple example illustrating the
1.16    ! jmc      1433: connection between the two.
        !          1434: The set of rules:
        !          1435: .Bd -literal -offset indent
        !          1436: %s example
        !          1437: %%
        !          1438:
        !          1439: <example>foo   do_something();
        !          1440:
        !          1441: bar            something_else();
        !          1442: .Ed
        !          1443: .Pp
1.1       deraadt  1444: is equivalent to
1.16    ! jmc      1445: .Bd -literal -offset indent
        !          1446: %x example
        !          1447: %%
        !          1448:
        !          1449: <example>foo   do_something();
        !          1450:
        !          1451: <INITIAL,example>bar    something_else();
        !          1452: .Ed
        !          1453: .Pp
1.1       deraadt  1454: Without the
1.16    ! jmc      1455: .Aq INITIAL,example
1.1       deraadt  1456: qualifier, the
1.16    ! jmc      1457: .Dq bar
        !          1458: pattern in the second example wouldn't be active
        !          1459: .Pq i.e., couldn't match
1.1       deraadt  1460: when in start condition
1.16    ! jmc      1461: .Dq example .
1.1       deraadt  1462: If we just used
1.16    ! jmc      1463: .Aq example
1.1       deraadt  1464: to qualify
1.16    ! jmc      1465: .Dq bar ,
1.1       deraadt  1466: though, then it would only be active in
1.16    ! jmc      1467: .Dq example
1.1       deraadt  1468: and not in
1.16    ! jmc      1469: .Em INITIAL ,
        !          1470: while in the first example it's active in both,
        !          1471: because in the first example the
        !          1472: .Dq example
        !          1473: start condition is an inclusive
        !          1474: .Pq Sq %s
1.1       deraadt  1475: start condition.
1.16    ! jmc      1476: .Pp
1.1       deraadt  1477: Also note that the special start-condition specifier
1.16    ! jmc      1478: .Sq Aq *
        !          1479: matches every start condition.
        !          1480: Thus, the above example could also have been written:
        !          1481: .Bd -literal -offset indent
        !          1482: %x example
        !          1483: %%
        !          1484:
        !          1485: <example>foo   do_something();
        !          1486:
        !          1487: <*>bar         something_else();
        !          1488: .Ed
        !          1489: .Pp
1.1       deraadt  1490: The default rule (to
1.16    ! jmc      1491: .Em ECHO
        !          1492: any unmatched character) remains active in start conditions.
        !          1493: It is equivalent to:
        !          1494: .Bd -literal -offset indent
        !          1495: <*>.|\en     ECHO;
        !          1496: .Ed
        !          1497: .Pp
        !          1498: .Dq BEGIN(0)
1.1       deraadt  1499: returns to the original state where only the rules with
1.16    ! jmc      1500: no start conditions are active.
        !          1501: This state can also be referred to as the start-condition
        !          1502: .Em INITIAL ,
        !          1503: so
        !          1504: .Dq BEGIN(INITIAL)
1.1       deraadt  1505: is equivalent to
1.16    ! jmc      1506: .Dq BEGIN(0) .
1.1       deraadt  1507: (The parentheses around the start condition name are not required but
                   1508: are considered good style.)
1.16    ! jmc      1509: .Pp
        !          1510: .Em BEGIN
1.1       deraadt  1511: actions can also be given as indented code at the beginning
1.16    ! jmc      1512: of the rules section.
        !          1513: For example, the following will cause the scanner to enter the
        !          1514: .Qq SPECIAL
        !          1515: start condition whenever
        !          1516: .Fn yylex
1.1       deraadt  1517: is called and the global variable
1.16    ! jmc      1518: .Fa enter_special
1.1       deraadt  1519: is true:
1.16    ! jmc      1520: .Bd -literal -offset indent
        !          1521: int enter_special;
1.1       deraadt  1522:
1.16    ! jmc      1523: %x SPECIAL
        !          1524: %%
        !          1525:         if (enter_special)
1.1       deraadt  1526:                 BEGIN(SPECIAL);
                   1527:
1.16    ! jmc      1528: <SPECIAL>blahblahblah
        !          1529: \&...more rules follow...
        !          1530: .Ed
        !          1531: .Pp
1.1       deraadt  1532: To illustrate the uses of start conditions,
                   1533: here is a scanner which provides two different interpretations
1.16    ! jmc      1534: of a string like
        !          1535: .Qq 123.456 .
        !          1536: By default it will treat it as three tokens: the integer
        !          1537: .Qq 123 ,
        !          1538: a dot
        !          1539: .Pq Sq .\& ,
        !          1540: and the integer
        !          1541: .Qq 456 .
1.1       deraadt  1542: But if the string is preceded earlier in the line by the string
1.16    ! jmc      1543: .Qq expect-floats
        !          1544: it will treat it as a single token, the floating-point number 123.456:
        !          1545: .Bd -literal -offset indent
        !          1546: %{
        !          1547: #include <math.h>
        !          1548: %}
        !          1549: %s expect
        !          1550:
        !          1551: %%
        !          1552: expect-floats        BEGIN(expect);
        !          1553:
        !          1554: <expect>[0-9]+"."[0-9]+ {
        !          1555:         printf("found a float, = %f\en",
        !          1556:             atof(yytext));
        !          1557: }
        !          1558: <expect>\en {
        !          1559:         /*
        !          1560:          * That's the end of the line, so
        !          1561:          * we need another "expect-number"
        !          1562:          * before we'll recognize any more
        !          1563:          * numbers.
        !          1564:          */
        !          1565:         BEGIN(INITIAL);
        !          1566: }
        !          1567:
        !          1568: [0-9]+ {
        !          1569:         printf("found an integer, = %d\en",
        !          1570:             atoi(yytext));
        !          1571: }
        !          1572:
        !          1573: "."     printf("found a dot\en");
        !          1574: .Ed
        !          1575: .Pp
        !          1576: Here is a scanner which recognizes
        !          1577: .Pq and discards
        !          1578: C comments while maintaining a count of the current input line:
        !          1579: .Bd -literal -offset indent
        !          1580: %x comment
        !          1581: %%
        !          1582: int line_num = 1;
        !          1583:
        !          1584: "/*"                    BEGIN(comment);
        !          1585:
        !          1586: <comment>[^*\en]*        /* eat anything that's not a '*' */
        !          1587: <comment>"*"+[^*/\en]*   /* eat up '*'s not followed by '/'s */
        !          1588: <comment>\en             ++line_num;
        !          1589: <comment>"*"+"/"        BEGIN(INITIAL);
        !          1590: .Ed
        !          1591: .Pp
1.1       deraadt  1592: This scanner goes to a bit of trouble to match as much
1.16    ! jmc      1593: text as possible with each rule.
        !          1594: In general, when attempting to write a high-speed scanner
        !          1595: try to match as much as possible in each rule, as it's a big win.
        !          1596: .Pp
1.10      deraadt  1597: Note that start-condition names are really integer values and
1.16    ! jmc      1598: can be stored as such.
        !          1599: Thus, the above could be extended in the following fashion:
        !          1600: .Bd -literal -offset indent
        !          1601: %x comment foo
        !          1602: %%
        !          1603: int line_num = 1;
        !          1604: int comment_caller;
        !          1605:
        !          1606: "/*" {
        !          1607:         comment_caller = INITIAL;
        !          1608:         BEGIN(comment);
        !          1609: }
        !          1610:
        !          1611: \&...
        !          1612:
        !          1613: <foo>"/*" {
        !          1614:         comment_caller = foo;
        !          1615:         BEGIN(comment);
        !          1616: }
        !          1617:
        !          1618: <comment>[^*\en]*        /* eat anything that's not a '*' */
        !          1619: <comment>"*"+[^*/\en]*   /* eat up '*'s not followed by '/'s */
        !          1620: <comment>\en             ++line_num;
        !          1621: <comment>"*"+"/"        BEGIN(comment_caller);
        !          1622: .Ed
        !          1623: .Pp
        !          1624: Furthermore, the current start condition can be accessed by using
1.1       deraadt  1625: the integer-valued
1.16    ! jmc      1626: .Dv YY_START
        !          1627: macro.
        !          1628: For example, the above assignments to
        !          1629: .Em comment_caller
1.1       deraadt  1630: could instead be written
1.16    ! jmc      1631: .Pp
        !          1632: .Dl comment_caller = YY_START;
        !          1633: .Pp
1.1       deraadt  1634: Flex provides
1.16    ! jmc      1635: .Dv YYSTATE
1.1       deraadt  1636: as an alias for
1.16    ! jmc      1637: .Dv YY_START
1.1       deraadt  1638: (since that is what's used by AT&T
1.16    ! jmc      1639: .Nm lex ) .
        !          1640: .Pp
        !          1641: Note that start conditions do not have their own name-space;
        !          1642: %s's and %x's declare names in the same fashion as #define's.
        !          1643: .Pp
1.1       deraadt  1644: Finally, here's an example of how to match C-style quoted strings using
1.16    ! jmc      1645: exclusive start conditions, including expanded escape sequences
        !          1646: (but not including checking for a string that's too long):
        !          1647: .Bd -literal -offset indent
        !          1648: %x str
        !          1649:
        !          1650: %%
        !          1651: #define MAX_STR_CONST 1024
        !          1652: char string_buf[MAX_STR_CONST];
        !          1653: char *string_buf_ptr;
        !          1654:
        !          1655: \e"      string_buf_ptr = string_buf; BEGIN(str);
        !          1656:
        !          1657: <str>\e" { /* saw closing quote - all done */
        !          1658:         BEGIN(INITIAL);
        !          1659:         *string_buf_ptr = '\e0';
        !          1660:         /*
        !          1661:          * return string constant token type and
        !          1662:          * value to parser
        !          1663:          */
        !          1664: }
        !          1665:
        !          1666: <str>\en {
        !          1667:         /* error - unterminated string constant */
        !          1668:         /* generate error message */
        !          1669: }
        !          1670:
        !          1671: <str>\e\e[0-7]{1,3} {
        !          1672:         /* octal escape sequence */
        !          1673:         int result;
        !          1674:
        !          1675:         (void) sscanf(yytext + 1, "%o", &result);
        !          1676:
        !          1677:         if (result > 0xff) {
        !          1678:                 /* error, constant is out-of-bounds */
        !          1679:        } else
        !          1680:                *string_buf_ptr++ = result;
        !          1681: }
        !          1682:
        !          1683: <str>\e\e[0-9]+ {
        !          1684:         /*
        !          1685:          * generate error - bad escape sequence; something
        !          1686:          * like '\e48' or '\e0777777'
        !          1687:          */
        !          1688: }
        !          1689:
        !          1690: <str>\e\en  *string_buf_ptr++ = '\en';
        !          1691: <str>\e\et  *string_buf_ptr++ = '\et';
        !          1692: <str>\e\er  *string_buf_ptr++ = '\er';
        !          1693: <str>\e\eb  *string_buf_ptr++ = '\eb';
        !          1694: <str>\e\ef  *string_buf_ptr++ = '\ef';
        !          1695:
        !          1696: <str>\e\e(.|\en)  *string_buf_ptr++ = yytext[1];
        !          1697:
        !          1698: <str>[^\e\e\en\e"]+ {
        !          1699:         char *yptr = yytext;
        !          1700:
        !          1701:         while (*yptr)
        !          1702:                 *string_buf_ptr++ = *yptr++;
        !          1703: }
        !          1704: .Ed
        !          1705: .Pp
        !          1706: Often, such as in some of the examples above,
        !          1707: a whole bunch of rules are all preceded by the same start condition(s).
        !          1708: .Nm
1.1       deraadt  1709: makes this a little easier and cleaner by introducing a notion of
                   1710: start condition
1.16    ! jmc      1711: .Em scope .
1.1       deraadt  1712: A start condition scope is begun with:
1.16    ! jmc      1713: .Pp
        !          1714: .Dl <SCs>{
        !          1715: .Pp
1.1       deraadt  1716: where
1.16    ! jmc      1717: .Dq SCs
        !          1718: is a list of one or more start conditions.
        !          1719: Inside the start condition scope, every rule automatically has the prefix
        !          1720: .Aq SCs
1.1       deraadt  1721: applied to it, until a
1.16    ! jmc      1722: .Sq }
1.1       deraadt  1723: which matches the initial
1.16    ! jmc      1724: .Sq { .
1.1       deraadt  1725: So, for example,
1.16    ! jmc      1726: .Bd -literal -offset indent
        !          1727: <ESC>{
        !          1728:     "\e\en"   return '\en';
        !          1729:     "\e\er"   return '\er';
        !          1730:     "\e\ef"   return '\ef';
        !          1731:     "\e\e0"   return '\e0';
        !          1732: }
        !          1733: .Ed
        !          1734: .Pp
1.1       deraadt  1735: is equivalent to:
1.16    ! jmc      1736: .Bd -literal -offset indent
        !          1737: <ESC>"\e\en"  return '\en';
        !          1738: <ESC>"\e\er"  return '\er';
        !          1739: <ESC>"\e\ef"  return '\ef';
        !          1740: <ESC>"\e\e0"  return '\e0';
        !          1741: .Ed
        !          1742: .Pp
1.1       deraadt  1743: Start condition scopes may be nested.
1.16    ! jmc      1744: .Pp
1.1       deraadt  1745: Three routines are available for manipulating stacks of start conditions:
1.16    ! jmc      1746: .Bl -tag -width Ds
        !          1747: .It void yy_push_state(int new_state)
        !          1748: Pushes the current start condition onto the top of the start condition
1.1       deraadt  1749: stack and switches to
1.16    ! jmc      1750: .Fa new_state
        !          1751: as though
        !          1752: .Dq BEGIN new_state
        !          1753: had been used
        !          1754: .Pq recall that start condition names are also integers .
        !          1755: .It void yy_pop_state()
        !          1756: Pops the top of the stack and switches to it via
        !          1757: .Em BEGIN .
        !          1758: .It int yy_top_state()
        !          1759: Returns the top of the stack without altering the stack's contents.
        !          1760: .El
        !          1761: .Pp
1.1       deraadt  1762: The start condition stack grows dynamically and so has no built-in
1.16    ! jmc      1763: size limitation.
        !          1764: If memory is exhausted, program execution aborts.
        !          1765: .Pp
        !          1766: To use start condition stacks, scanners must include a
        !          1767: .Dq %option stack
        !          1768: directive (see
        !          1769: .Sx OPTIONS
        !          1770: below).
        !          1771: .Sh MULTIPLE INPUT BUFFERS
        !          1772: Some scanners
        !          1773: (such as those which support
        !          1774: .Qq include
        !          1775: files)
        !          1776: require reading from several input streams.
        !          1777: As
        !          1778: .Nm
1.1       deraadt  1779: scanners do a large amount of buffering, one cannot control
                   1780: where the next input will be read from by simply writing a
1.16    ! jmc      1781: .Dv YY_INPUT
1.1       deraadt  1782: which is sensitive to the scanning context.
1.16    ! jmc      1783: .Dv YY_INPUT
1.1       deraadt  1784: is only called when the scanner reaches the end of its buffer, which
1.16    ! jmc      1785: may be a long time after scanning a statement such as an
        !          1786: .Qq include
1.1       deraadt  1787: which requires switching the input source.
1.16    ! jmc      1788: .Pp
1.1       deraadt  1789: To negotiate these sorts of problems,
1.16    ! jmc      1790: .Nm
1.1       deraadt  1791: provides a mechanism for creating and switching between multiple
1.16    ! jmc      1792: input buffers.
        !          1793: An input buffer is created by using:
        !          1794: .Pp
        !          1795: .D1 YY_BUFFER_STATE yy_create_buffer(FILE *file, int size)
        !          1796: .Pp
1.1       deraadt  1797: which takes a
1.16    ! jmc      1798: .Fa FILE
        !          1799: pointer and a
        !          1800: .Fa size
        !          1801: and creates a buffer associated with the given file and large enough to hold
        !          1802: .Fa size
1.1       deraadt  1803: characters (when in doubt, use
1.16    ! jmc      1804: .Dv YY_BUF_SIZE
        !          1805: for the size).
        !          1806: It returns a
        !          1807: .Dv YY_BUFFER_STATE
        !          1808: handle, which may then be passed to other routines
        !          1809: .Pq see below .
        !          1810: The
        !          1811: .Dv YY_BUFFER_STATE
1.1       deraadt  1812: type is a pointer to an opaque
1.16    ! jmc      1813: .Dq struct yy_buffer_state
        !          1814: structure, so
        !          1815: .Dv YY_BUFFER_STATE
        !          1816: variables may be safely initialized to
        !          1817: .Dq ((YY_BUFFER_STATE) 0)
        !          1818: if desired, and the opaque structure can also be referred to in order to
        !          1819: correctly declare input buffers in source files other than that of scanners.
        !          1820: Note that the
        !          1821: .Fa FILE
1.1       deraadt  1822: pointer in the call to
1.16    ! jmc      1823: .Fn yy_create_buffer
1.1       deraadt  1824: is only used as the value of
1.16    ! jmc      1825: .Fa yyin
1.1       deraadt  1826: seen by
1.16    ! jmc      1827: .Dv YY_INPUT ;
        !          1828: if
        !          1829: .Dv YY_INPUT
        !          1830: is redefined so that it no longer uses
        !          1831: .Fa yyin ,
        !          1832: then a nil
        !          1833: .Fa FILE
        !          1834: pointer can safely be passed to
        !          1835: .Fn yy_create_buffer .
        !          1836: To select a particular buffer to scan:
        !          1837: .Pp
        !          1838: .D1 void yy_switch_to_buffer(YY_BUFFER_STATE new_buffer)
        !          1839: .Pp
        !          1840: It switches the scanner's input buffer so subsequent tokens will
1.1       deraadt  1841: come from
1.16    ! jmc      1842: .Fa new_buffer .
1.1       deraadt  1843: Note that
1.16    ! jmc      1844: .Fn yy_switch_to_buffer
        !          1845: may be used by
        !          1846: .Fn yywrap
        !          1847: to set things up for continued scanning,
        !          1848: instead of opening a new file and pointing
        !          1849: .Fa yyin
        !          1850: at it.
        !          1851: Note also that switching input sources via either
        !          1852: .Fn yy_switch_to_buffer
        !          1853: or
        !          1854: .Fn yywrap
        !          1855: does not change the start condition.
        !          1856: .Pp
        !          1857: .D1 void yy_delete_buffer(YY_BUFFER_STATE buffer)
        !          1858: .Pp
        !          1859: is used to reclaim the storage associated with a buffer.
        !          1860: .Pf ( Fa buffer
1.1       deraadt  1861: can be nil, in which case the routine does nothing.)
1.16    ! jmc      1862: To clear the current contents of a buffer:
        !          1863: .Pp
        !          1864: .D1 void yy_flush_buffer(YY_BUFFER_STATE buffer)
        !          1865: .Pp
1.1       deraadt  1866: This function discards the buffer's contents,
1.16    ! jmc      1867: so the next time the scanner attempts to match a token from the buffer,
        !          1868: it will first fill the buffer anew using
        !          1869: .Dv YY_INPUT .
        !          1870: .Pp
        !          1871: .Fn yy_new_buffer
1.1       deraadt  1872: is an alias for
1.16    ! jmc      1873: .Fn yy_create_buffer ,
1.1       deraadt  1874: provided for compatibility with the C++ use of
1.16    ! jmc      1875: .Em new
1.1       deraadt  1876: and
1.16    ! jmc      1877: .Em delete
1.1       deraadt  1878: for creating and destroying dynamic objects.
1.16    ! jmc      1879: .Pp
1.1       deraadt  1880: Finally, the
1.16    ! jmc      1881: .Dv YY_CURRENT_BUFFER
1.1       deraadt  1882: macro returns a
1.16    ! jmc      1883: .Dv YY_BUFFER_STATE
1.1       deraadt  1884: handle to the current buffer.
1.16    ! jmc      1885: .Pp
1.1       deraadt  1886: Here is an example of using these features for writing a scanner
                   1887: which expands include files (the
1.16    ! jmc      1888: .Aq Aq EOF
1.1       deraadt  1889: feature is discussed below):
1.16    ! jmc      1890: .Bd -literal -offset indent
        !          1891: /*
        !          1892:  * the "incl" state is used for picking up the name
        !          1893:  * of an include file
        !          1894:  */
        !          1895: %x incl
        !          1896:
        !          1897: %{
        !          1898: #define MAX_INCLUDE_DEPTH 10
        !          1899: YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
        !          1900: int include_stack_ptr = 0;
        !          1901: %}
        !          1902:
        !          1903: %%
        !          1904: include             BEGIN(incl);
        !          1905:
        !          1906: [a-z]+              ECHO;
        !          1907: [^a-z\en]*\en?        ECHO;
        !          1908:
        !          1909: <incl>[ \et]*        /* eat the whitespace */
        !          1910: <incl>[^ \et\en]+ {   /* got the include file name */
        !          1911:         if (include_stack_ptr >= MAX_INCLUDE_DEPTH)
        !          1912:                 errx(1, "Includes nested too deeply");
        !          1913:
        !          1914:         include_stack[include_stack_ptr++] =
        !          1915:             YY_CURRENT_BUFFER;
        !          1916:
        !          1917:         yyin = fopen(yytext, "r");
        !          1918:
        !          1919:         if (yyin == NULL)
        !          1920:                 err(1, NULL);
1.1       deraadt  1921:
1.16    ! jmc      1922:         yy_switch_to_buffer(
        !          1923:             yy_create_buffer(yyin, YY_BUF_SIZE));
1.1       deraadt  1924:
1.16    ! jmc      1925:         BEGIN(INITIAL);
        !          1926: }
1.1       deraadt  1927:
1.16    ! jmc      1928: <<EOF>> {
        !          1929:         if (--include_stack_ptr < 0)
1.1       deraadt  1930:                 yyterminate();
1.16    ! jmc      1931:         else {
        !          1932:                 yy_delete_buffer(YY_CURRENT_BUFFER);
1.1       deraadt  1933:                 yy_switch_to_buffer(
1.16    ! jmc      1934:                     include_stack[include_stack_ptr]);
        !          1935:        }
        !          1936: }
        !          1937: .Ed
        !          1938: .Pp
1.1       deraadt  1939: Three routines are available for setting up input buffers for
1.16    ! jmc      1940: scanning in-memory strings instead of files.
        !          1941: All of them create a new input buffer for scanning the string,
        !          1942: and return a corresponding
        !          1943: .Dv YY_BUFFER_STATE
        !          1944: handle (which should be deleted afterwards using
        !          1945: .Fn yy_delete_buffer ) .
        !          1946: They also switch to the new buffer using
        !          1947: .Fn yy_switch_to_buffer ,
1.1       deraadt  1948: so the next call to
1.16    ! jmc      1949: .Fn yylex
1.1       deraadt  1950: will start scanning the string.
1.16    ! jmc      1951: .Bl -tag -width Ds
        !          1952: .It yy_scan_string(const char *str)
        !          1953: Scans a NUL-terminated string.
        !          1954: .It yy_scan_bytes(const char *bytes, int len)
        !          1955: Scans
        !          1956: .Fa len
        !          1957: bytes
        !          1958: .Pq including possibly NUL's
1.1       deraadt  1959: starting at location
1.16    ! jmc      1960: .Fa bytes .
        !          1961: .El
        !          1962: .Pp
        !          1963: Note that both of these functions create and scan a copy
        !          1964: of the string or bytes.
        !          1965: (This may be desirable, since
        !          1966: .Fn yylex
        !          1967: modifies the contents of the buffer it is scanning.)
        !          1968: The copy can be avoided by using:
        !          1969: .Bl -tag -width Ds
        !          1970: .It yy_scan_buffer(char *base, yy_size_t size)
        !          1971: Which scans the buffer starting at
        !          1972: .Fa base ,
1.1       deraadt  1973: consisting of
1.16    ! jmc      1974: .Fa size
        !          1975: bytes, the last two bytes of which must be
        !          1976: .Dv YY_END_OF_BUFFER_CHAR
        !          1977: .Pq ASCII NUL .
        !          1978: These last two bytes are not scanned; thus, scanning consists of
        !          1979: base[0] through base[size-2], inclusive.
        !          1980: .Pp
        !          1981: If
        !          1982: .Fa base
        !          1983: is not set up in this manner
        !          1984: (i.e., forget the final two
        !          1985: .Dv YY_END_OF_BUFFER_CHAR
1.1       deraadt  1986: bytes), then
1.16    ! jmc      1987: .Fn yy_scan_buffer
1.1       deraadt  1988: returns a nil pointer instead of creating a new input buffer.
1.16    ! jmc      1989: .Pp
1.1       deraadt  1990: The type
1.16    ! jmc      1991: .Fa yy_size_t
        !          1992: is an integral type which can be cast to an integer expression
1.1       deraadt  1993: reflecting the size of the buffer.
1.16    ! jmc      1994: .El
        !          1995: .Sh END-OF-FILE RULES
        !          1996: The special rule
        !          1997: .Qq Aq Aq EOF
        !          1998: indicates actions which are to be taken when an end-of-file is encountered and
        !          1999: .Fn yywrap
        !          2000: returns non-zero
        !          2001: .Pq i.e., indicates no further files to process .
        !          2002: The action must finish by doing one of four things:
        !          2003: .Bl -dash
        !          2004: .It
        !          2005: Assigning
        !          2006: .Em yyin
        !          2007: to a new input file
        !          2008: (in previous versions of
        !          2009: .Nm ,
        !          2010: after doing the assignment, it was necessary to call the special action
        !          2011: .Dv YY_NEW_FILE ;
        !          2012: this is no longer necessary).
        !          2013: .It
        !          2014: Executing a
        !          2015: .Em return
        !          2016: statement.
        !          2017: .It
        !          2018: Executing the special
        !          2019: .Fn yyterminate
        !          2020: action.
        !          2021: .It
        !          2022: Switching to a new buffer using
        !          2023: .Fn yy_switch_to_buffer
1.1       deraadt  2024: as shown in the example above.
1.16    ! jmc      2025: .El
        !          2026: .Pp
        !          2027: .Aq Aq EOF
        !          2028: rules may not be used with other patterns;
        !          2029: they may only be qualified with a list of start conditions.
        !          2030: If an unqualified
        !          2031: .Aq Aq EOF
        !          2032: rule is given, it applies to all start conditions which do not already have
        !          2033: .Aq Aq EOF
        !          2034: actions.
        !          2035: To specify an
        !          2036: .Aq Aq EOF
        !          2037: rule for only the initial start condition, use
        !          2038: .Pp
        !          2039: .Dl <INITIAL><<EOF>>
        !          2040: .Pp
1.1       deraadt  2041: These rules are useful for catching things like unclosed comments.
                   2042: An example:
1.16    ! jmc      2043: .Bd -literal -offset indent
        !          2044: %x quote
        !          2045: %%
        !          2046:
        !          2047: \&...other rules for dealing with quotes...
        !          2048:
        !          2049: <quote><<EOF>> {
        !          2050:          error("unterminated quote");
        !          2051:          yyterminate();
        !          2052: }
        !          2053: <<EOF>> {
        !          2054:          if (*++filelist)
        !          2055:                  yyin = fopen(*filelist, "r");
        !          2056:          else
        !          2057:                  yyterminate();
        !          2058: }
        !          2059: .Ed
        !          2060: .Sh MISCELLANEOUS MACROS
1.1       deraadt  2061: The macro
1.16    ! jmc      2062: .Dv YY_USER_ACTION
1.1       deraadt  2063: can be defined to provide an action
1.16    ! jmc      2064: which is always executed prior to the matched rule's action.
        !          2065: For example,
1.1       deraadt  2066: it could be #define'd to call a routine to convert yytext to lower-case.
                   2067: When
1.16    ! jmc      2068: .Dv YY_USER_ACTION
1.1       deraadt  2069: is invoked, the variable
1.16    ! jmc      2070: .Fa yy_act
        !          2071: gives the number of the matched rule
        !          2072: .Pq rules are numbered starting with 1 .
        !          2073: For example, to profile how often each rule is matched,
        !          2074: the following would do the trick:
        !          2075: .Pp
        !          2076: .Dl #define YY_USER_ACTION ++ctr[yy_act]
        !          2077: .Pp
1.1       deraadt  2078: where
1.16    ! jmc      2079: .Fa ctr
        !          2080: is an array to hold the counts for the different rules.
        !          2081: Note that the macro
        !          2082: .Dv YY_NUM_RULES
        !          2083: gives the total number of rules
        !          2084: (including the default rule, even if
        !          2085: .Fl s
        !          2086: is used),
1.1       deraadt  2087: so a correct declaration for
1.16    ! jmc      2088: .Fa ctr
1.1       deraadt  2089: is:
1.16    ! jmc      2090: .Pp
        !          2091: .Dl int ctr[YY_NUM_RULES];
        !          2092: .Pp
1.1       deraadt  2093: The macro
1.16    ! jmc      2094: .Dv YY_USER_INIT
1.1       deraadt  2095: may be defined to provide an action which is always executed before
1.16    ! jmc      2096: the first scan
        !          2097: .Pq and before the scanner's internal initializations are done .
1.1       deraadt  2098: For example, it could be used to call a routine to read
                   2099: in a data table or open a logging file.
1.16    ! jmc      2100: .Pp
1.1       deraadt  2101: The macro
1.16    ! jmc      2102: .Dv yy_set_interactive(is_interactive)
1.1       deraadt  2103: can be used to control whether the current buffer is considered
1.16    ! jmc      2104: .Em interactive .
1.1       deraadt  2105: An interactive buffer is processed more slowly,
                   2106: but must be used when the scanner's input source is indeed
                   2107: interactive to avoid problems due to waiting to fill buffers
                   2108: (see the discussion of the
1.16    ! jmc      2109: .Fl I
        !          2110: flag below).
        !          2111: A non-zero value in the macro invocation marks the buffer as interactive,
        !          2112: a zero value as non-interactive.
        !          2113: Note that use of this macro overrides
        !          2114: .Dq %option always-interactive
        !          2115: or
        !          2116: .Dq %option never-interactive
        !          2117: (see
        !          2118: .Sx OPTIONS
        !          2119: below).
        !          2120: .Fn yy_set_interactive
1.1       deraadt  2121: must be invoked prior to beginning to scan the buffer that is
1.16    ! jmc      2122: .Pq or is not
        !          2123: to be considered interactive.
        !          2124: .Pp
1.1       deraadt  2125: The macro
1.16    ! jmc      2126: .Dv yy_set_bol(at_bol)
1.1       deraadt  2127: can be used to control whether the current buffer's scanning
                   2128: context for the next token match is done as though at the
1.16    ! jmc      2129: beginning of a line.
        !          2130: A non-zero macro argument makes rules anchored with
        !          2131: .Sq ^
        !          2132: active, while a zero argument makes
        !          2133: .Sq ^
        !          2134: rules inactive.
        !          2135: .Pp
1.1       deraadt  2136: The macro
1.16    ! jmc      2137: .Dv YY_AT_BOL
        !          2138: returns true if the next token scanned from the current buffer will have
        !          2139: .Sq ^
        !          2140: rules active, false otherwise.
        !          2141: .Pp
1.1       deraadt  2142: In the generated scanner, the actions are all gathered in one large
                   2143: switch statement and separated using
1.16    ! jmc      2144: .Dv YY_BREAK ,
        !          2145: which may be redefined.
        !          2146: By default, it is simply a
        !          2147: .Qq break ,
        !          2148: to separate each rule's action from the following rules.
1.1       deraadt  2149: Redefining
1.16    ! jmc      2150: .Dv YY_BREAK
1.1       deraadt  2151: allows, for example, C++ users to
1.16    ! jmc      2152: .Dq #define YY_BREAK
        !          2153: to do nothing
        !          2154: (while being very careful that every rule ends with a
        !          2155: .Qq break
        !          2156: or a
        !          2157: .Qq return ! )
        !          2158: to avoid suffering from unreachable statement warnings where because a rule's
        !          2159: action ends with
        !          2160: .Dq return ,
        !          2161: the
        !          2162: .Dv YY_BREAK
1.1       deraadt  2163: is inaccessible.
1.16    ! jmc      2164: .Sh VALUES AVAILABLE TO THE USER
1.1       deraadt  2165: This section summarizes the various values available to the user
                   2166: in the rule actions.
1.16    ! jmc      2167: .Bl -tag -width Ds
        !          2168: .It char *yytext
        !          2169: Holds the text of the current token.
        !          2170: It may be modified but not lengthened
        !          2171: .Pq characters cannot be appended to the end .
        !          2172: .Pp
1.1       deraadt  2173: If the special directive
1.16    ! jmc      2174: .Dq %array
1.1       deraadt  2175: appears in the first section of the scanner description, then
1.16    ! jmc      2176: .Fa yytext
1.1       deraadt  2177: is instead declared
1.16    ! jmc      2178: .Dq char yytext[YYLMAX] ,
1.1       deraadt  2179: where
1.16    ! jmc      2180: .Dv YYLMAX
        !          2181: is a macro definition that can be redefined in the first section
        !          2182: to change the default value
        !          2183: .Pq generally 8KB .
        !          2184: Using
        !          2185: .Dq %array
1.1       deraadt  2186: results in somewhat slower scanners, but the value of
1.16    ! jmc      2187: .Fa yytext
1.1       deraadt  2188: becomes immune to calls to
1.16    ! jmc      2189: .Fn input
1.1       deraadt  2190: and
1.16    ! jmc      2191: .Fn unput ,
1.1       deraadt  2192: which potentially destroy its value when
1.16    ! jmc      2193: .Fa yytext
        !          2194: is a character pointer.
        !          2195: The opposite of
        !          2196: .Dq %array
1.1       deraadt  2197: is
1.16    ! jmc      2198: .Dq %pointer ,
1.1       deraadt  2199: which is the default.
1.16    ! jmc      2200: .Pp
        !          2201: .Dq %array
        !          2202: cannot be used when generating C++ scanner classes
1.1       deraadt  2203: (the
1.16    ! jmc      2204: .Fl +
1.1       deraadt  2205: flag).
1.16    ! jmc      2206: .It int yyleng
        !          2207: Holds the length of the current token.
        !          2208: .It FILE *yyin
        !          2209: Is the file which by default
        !          2210: .Nm
        !          2211: reads from.
        !          2212: It may be redefined, but doing so only makes sense before
        !          2213: scanning begins or after an
        !          2214: .Dv EOF
        !          2215: has been encountered.
        !          2216: Changing it in the midst of scanning will have unexpected results since
        !          2217: .Nm
1.1       deraadt  2218: buffers its input; use
1.16    ! jmc      2219: .Fn yyrestart
1.1       deraadt  2220: instead.
                   2221: Once scanning terminates because an end-of-file
1.16    ! jmc      2222: has been seen,
        !          2223: .Fa yyin
        !          2224: can be assigned as the new input file
        !          2225: and the scanner can be called again to continue scanning.
        !          2226: .It void yyrestart(FILE *new_file)
        !          2227: May be called to point
        !          2228: .Fa yyin
        !          2229: at the new input file.
        !          2230: The switch-over to the new file is immediate
        !          2231: .Pq any previously buffered-up input is lost .
        !          2232: Note that calling
        !          2233: .Fn yyrestart
1.1       deraadt  2234: with
1.16    ! jmc      2235: .Fa yyin
1.1       deraadt  2236: as an argument thus throws away the current input buffer and continues
                   2237: scanning the same input file.
1.16    ! jmc      2238: .It FILE *yyout
        !          2239: Is the file to which
        !          2240: .Em ECHO
        !          2241: actions are done.
        !          2242: It can be reassigned by the user.
        !          2243: .It YY_CURRENT_BUFFER
        !          2244: Returns a
        !          2245: .Dv YY_BUFFER_STATE
1.1       deraadt  2246: handle to the current buffer.
1.16    ! jmc      2247: .It YY_START
        !          2248: Returns an integer value corresponding to the current start condition.
        !          2249: This value can subsequently be used with
        !          2250: .Em BEGIN
1.1       deraadt  2251: to return to that start condition.
1.16    ! jmc      2252: .El
        !          2253: .Sh INTERFACING WITH YACC
1.1       deraadt  2254: One of the main uses of
1.16    ! jmc      2255: .Nm
1.1       deraadt  2256: is as a companion to the
1.16    ! jmc      2257: .Xr yacc 1
1.1       deraadt  2258: parser-generator.
1.16    ! jmc      2259: yacc parsers expect to call a routine named
        !          2260: .Fn yylex
        !          2261: to find the next input token.
        !          2262: The routine is supposed to return the type of the next token
        !          2263: as well as putting any associated value in the global
        !          2264: .Fa yylval .
1.1       deraadt  2265: To use
1.16    ! jmc      2266: .Nm
        !          2267: with yacc, one specifies the
        !          2268: .Fl d
        !          2269: option to yacc to instruct it to generate the file
        !          2270: .Pa y.tab.h
1.1       deraadt  2271: containing definitions of all the
1.16    ! jmc      2272: .Dq %tokens
        !          2273: appearing in the yacc input.
        !          2274: This file is then included in the
        !          2275: .Nm
        !          2276: scanner.
        !          2277: For example, if one of the tokens is
        !          2278: .Qq TOK_NUMBER ,
1.1       deraadt  2279: part of the scanner might look like:
1.16    ! jmc      2280: .Bd -literal -offset indent
        !          2281: %{
        !          2282: #include "y.tab.h"
        !          2283: %}
        !          2284:
        !          2285: %%
        !          2286:
        !          2287: [0-9]+        yylval = atoi(yytext); return TOK_NUMBER;
        !          2288: .Ed
        !          2289: .Sh OPTIONS
        !          2290: .Nm
1.1       deraadt  2291: has the following options:
1.16    ! jmc      2292: .Bl -tag -width Ds
        !          2293: .It Fl 7
        !          2294: Instructs
        !          2295: .Nm
        !          2296: to generate a 7-bit scanner, i.e., one which can only recognize 7-bit
        !          2297: characters in its input.
        !          2298: The advantage of using
        !          2299: .Fl 7
1.1       deraadt  2300: is that the scanner's tables can be up to half the size of those generated
                   2301: using the
1.16    ! jmc      2302: .Fl 8
        !          2303: option
        !          2304: .Pq see below .
        !          2305: The disadvantage is that such scanners often hang
1.1       deraadt  2306: or crash if their input contains an 8-bit character.
1.16    ! jmc      2307: .Pp
        !          2308: Note, however, that unless generating a scanner using the
        !          2309: .Fl Cf
1.1       deraadt  2310: or
1.16    ! jmc      2311: .Fl CF
1.1       deraadt  2312: table compression options, use of
1.16    ! jmc      2313: .Fl 7
        !          2314: will save only a small amount of table space,
        !          2315: and make the scanner considerably less portable.
        !          2316: .Nm flex Ns 's
        !          2317: default behavior is to generate an 8-bit scanner unless
        !          2318: .Fl Cf
        !          2319: or
        !          2320: .Fl CF
        !          2321: is specified, in which case
        !          2322: .Nm
        !          2323: defaults to generating 7-bit scanners unless it was
        !          2324: configured to generate 8-bit scanners
        !          2325: (as will often be the case with non-USA sites).
        !          2326: It is possible tell whether
        !          2327: .Nm
        !          2328: generated a 7-bit or an 8-bit scanner by inspecting the flag summary in the
        !          2329: .Fl v
        !          2330: output as described below.
        !          2331: .Pp
        !          2332: Note that if
        !          2333: .Fl Cfe
        !          2334: or
        !          2335: .Fl CFe
        !          2336: are used
        !          2337: (the table compression options, but also using equivalence classes as
        !          2338: discussed below),
        !          2339: .Nm
        !          2340: still defaults to generating an 8-bit scanner,
        !          2341: since usually with these compression options full 8-bit tables
1.1       deraadt  2342: are not much more expensive than 7-bit tables.
1.16    ! jmc      2343: .It Fl 8
        !          2344: Instructs
        !          2345: .Nm
1.1       deraadt  2346: to generate an 8-bit scanner, i.e., one which can recognize 8-bit
1.16    ! jmc      2347: characters.
        !          2348: This flag is only needed for scanners generated using
        !          2349: .Fl Cf
1.1       deraadt  2350: or
1.16    ! jmc      2351: .Fl CF ,
        !          2352: as otherwise
        !          2353: .Nm
        !          2354: defaults to generating an 8-bit scanner anyway.
        !          2355: .Pp
1.1       deraadt  2356: See the discussion of
1.16    ! jmc      2357: .Fl 7
        !          2358: above for
        !          2359: .Nm flex Ns 's
        !          2360: default behavior and the tradeoffs between 7-bit and 8-bit scanners.
        !          2361: .It Fl B
        !          2362: Instructs
        !          2363: .Nm
        !          2364: to generate a
        !          2365: .Em batch
        !          2366: scanner, the opposite of
        !          2367: .Em interactive
        !          2368: scanners generated by
        !          2369: .Fl I
        !          2370: .Pq see below .
        !          2371: In general,
        !          2372: .Fl B
        !          2373: is used when the scanner will never be used interactively,
        !          2374: and you want to squeeze a little more performance out of it.
        !          2375: If the aim is instead to squeeze out a lot more performance,
        !          2376: use the
        !          2377: .Fl Cf
        !          2378: or
        !          2379: .Fl CF
        !          2380: options
        !          2381: .Pq discussed below ,
        !          2382: which turn on
        !          2383: .Fl B
        !          2384: automatically anyway.
        !          2385: .It Fl b
        !          2386: Generate backing-up information to
        !          2387: .Pa lex.backup .
        !          2388: This is a list of scanner states which require backing up
        !          2389: and the input characters on which they do so.
        !          2390: By adding rules one can remove backing-up states.
        !          2391: If all backing-up states are eliminated and
        !          2392: .Fl Cf
        !          2393: or
        !          2394: .Fl CF
        !          2395: is used, the generated scanner will run faster (see the
        !          2396: .Fl p
        !          2397: flag).
        !          2398: Only users who wish to squeeze every last cycle out of their
        !          2399: scanners need worry about this option.
        !          2400: (See the section on
        !          2401: .Sx PERFORMANCE CONSIDERATIONS
        !          2402: below.)
        !          2403: .It Fl C Ns Op Cm aeFfmr
        !          2404: Controls the degree of table compression and, more generally, trade-offs
1.1       deraadt  2405: between small scanners and fast scanners.
1.16    ! jmc      2406: .Bl -tag -width Ds
        !          2407: .It Fl Ca
        !          2408: Instructs
        !          2409: .Nm
        !          2410: to trade off larger tables in the generated scanner for faster performance
        !          2411: because the elements of the tables are better aligned for memory access
        !          2412: and computation.
        !          2413: On some
        !          2414: .Tn RISC
        !          2415: architectures, fetching and manipulating longwords is more efficient
        !          2416: than with smaller-sized units such as shortwords.
        !          2417: This option can double the size of the tables used by the scanner.
        !          2418: .It Fl Ce
        !          2419: Directs
        !          2420: .Nm
1.1       deraadt  2421: to construct
1.16    ! jmc      2422: .Em equivalence classes ,
        !          2423: i.e., sets of characters which have identical lexical properties
        !          2424: (for example, if the only appearance of digits in the
        !          2425: .Nm
1.1       deraadt  2426: input is in the character class
1.16    ! jmc      2427: .Qq [0-9]
        !          2428: then the digits
        !          2429: .Sq 0 ,
        !          2430: .Sq 1 ,
        !          2431: .Sq ... ,
        !          2432: .Sq 9
        !          2433: will all be put in the same equivalence class).
        !          2434: Equivalence classes usually give dramatic reductions in the final
        !          2435: table/object file sizes
        !          2436: .Pq typically a factor of 2\-5
        !          2437: and are pretty cheap performance-wise
        !          2438: .Pq one array look-up per character scanned .
        !          2439: .It Fl CF
        !          2440: Specifies that the alternate fast scanner representation
        !          2441: (described below under the
        !          2442: .Fl F
        !          2443: option)
        !          2444: should be used.
        !          2445: This option cannot be used with
        !          2446: .Fl + .
        !          2447: .It Fl Cf
        !          2448: Specifies that the
        !          2449: .Em full
        !          2450: scanner tables should be generated \-
        !          2451: .Nm
        !          2452: should not compress the tables by taking advantage of
        !          2453: similar transition functions for different states.
        !          2454: .It Fl \&Cm
        !          2455: Directs
        !          2456: .Nm
1.1       deraadt  2457: to construct
1.16    ! jmc      2458: .Em meta-equivalence classes ,
        !          2459: which are sets of equivalence classes
        !          2460: (or characters, if equivalence classes are not being used)
        !          2461: that are commonly used together.
        !          2462: Meta-equivalence classes are often a big win when using compressed tables,
        !          2463: but they have a moderate performance impact
        !          2464: (one or two
        !          2465: .Qq if
        !          2466: tests and one array look-up per character scanned).
        !          2467: .It Fl Cr
        !          2468: Causes the generated scanner to
        !          2469: .Em bypass
        !          2470: use of the standard I/O library
        !          2471: .Pq stdio
        !          2472: for input.
        !          2473: Instead of calling
        !          2474: .Xr fread 3
1.1       deraadt  2475: or
1.16    ! jmc      2476: .Xr getc 3 ,
1.1       deraadt  2477: the scanner will use the
1.16    ! jmc      2478: .Xr read 2
        !          2479: system call,
        !          2480: resulting in a performance gain which varies from system to system,
        !          2481: but in general is probably negligible unless
        !          2482: .Fl Cf
1.1       deraadt  2483: or
1.16    ! jmc      2484: .Fl CF
        !          2485: are being used.
1.1       deraadt  2486: Using
1.16    ! jmc      2487: .Fl Cr
        !          2488: can cause strange behavior if, for example, reading from
        !          2489: .Fa yyin
        !          2490: using stdio prior to calling the scanner
        !          2491: (because the scanner will miss whatever text previous reads left
        !          2492: in the stdio input buffer).
        !          2493: .Pp
        !          2494: .Fl Cr
        !          2495: has no effect if
        !          2496: .Dv YY_INPUT
        !          2497: is defined
        !          2498: (see
        !          2499: .Sx THE GENERATED SCANNER
        !          2500: above).
        !          2501: .El
        !          2502: .Pp
1.1       deraadt  2503: A lone
1.16    ! jmc      2504: .Fl C
1.1       deraadt  2505: specifies that the scanner tables should be compressed but neither
                   2506: equivalence classes nor meta-equivalence classes should be used.
1.16    ! jmc      2507: .Pp
1.1       deraadt  2508: The options
1.16    ! jmc      2509: .Fl Cf
1.1       deraadt  2510: or
1.16    ! jmc      2511: .Fl CF
1.1       deraadt  2512: and
1.16    ! jmc      2513: .Fl \&Cm
        !          2514: do not make sense together \- there is no opportunity for meta-equivalence
        !          2515: classes if the table is not being compressed.
        !          2516: Otherwise the options may be freely mixed, and are cumulative.
        !          2517: .Pp
1.1       deraadt  2518: The default setting is
1.16    ! jmc      2519: .Fl Cem
1.1       deraadt  2520: which specifies that
1.16    ! jmc      2521: .Nm
        !          2522: should generate equivalence classes and meta-equivalence classes.
        !          2523: This setting provides the highest degree of table compression.
        !          2524: It is possible to trade off faster-executing scanners at the cost of
        !          2525: larger tables with the following generally being true:
        !          2526: .Bd -unfilled -offset indent
        !          2527: slowest & smallest
        !          2528:       -Cem
        !          2529:       -Cm
        !          2530:       -Ce
        !          2531:       -C
        !          2532:       -C{f,F}e
        !          2533:       -C{f,F}
        !          2534:       -C{f,F}a
        !          2535: fastest & largest
        !          2536: .Ed
        !          2537: .Pp
1.1       deraadt  2538: Note that scanners with the smallest tables are usually generated and
1.16    ! jmc      2539: compiled the quickest,
        !          2540: so during development the default is usually best,
        !          2541: maximal compression.
        !          2542: .Pp
        !          2543: .Fl Cfe
        !          2544: is often a good compromise between speed and size for production scanners.
        !          2545: .It Fl c
        !          2546: A do-nothing, deprecated option included for
        !          2547: .Tn POSIX
        !          2548: compliance.
        !          2549: .It Fl d
        !          2550: Makes the generated scanner run in debug mode.
        !          2551: Whenever a pattern is recognized and the global
        !          2552: .Fa yy_flex_debug
        !          2553: is non-zero
        !          2554: .Pq which is the default ,
        !          2555: the scanner will write to stderr a line of the form:
        !          2556: .Pp
        !          2557: .D1 --accepting rule at line 53 ("the matched text")
        !          2558: .Pp
        !          2559: The line number refers to the location of the rule in the file
        !          2560: defining the scanner
        !          2561: (i.e., the file that was fed to
        !          2562: .Nm ) .
        !          2563: Messages are also generated when the scanner backs up,
        !          2564: accepts the default rule,
        !          2565: reaches the end of its input buffer
        !          2566: (or encounters a NUL;
        !          2567: at this point, the two look the same as far as the scanner's concerned),
        !          2568: or reaches an end-of-file.
        !          2569: .It Fl F
        !          2570: Specifies that the fast scanner table representation should be used
        !          2571: .Pq and stdio bypassed .
        !          2572: This representation is about as fast as the full table representation
        !          2573: .Pq Fl f ,
        !          2574: and for some sets of patterns will be considerably smaller
        !          2575: .Pq and for others, larger .
        !          2576: In general, if the pattern set contains both
        !          2577: .Qq keywords
        !          2578: and a catch-all,
        !          2579: .Qq identifier
        !          2580: rule, such as in the set:
        !          2581: .Bd -unfilled -offset indent
        !          2582: "case"    return TOK_CASE;
        !          2583: "switch"  return TOK_SWITCH;
        !          2584: \&...
        !          2585: "default" return TOK_DEFAULT;
        !          2586: [a-z]+    return TOK_ID;
        !          2587: .Ed
        !          2588: .Pp
        !          2589: then it's better to use the full table representation.
        !          2590: If only the
        !          2591: .Qq identifier
        !          2592: rule is present and a hash table or some such is used to detect the keywords,
        !          2593: it's better to use
        !          2594: .Fl F .
        !          2595: .Pp
        !          2596: This option is equivalent to
        !          2597: .Fl CFr
        !          2598: .Pq see above .
        !          2599: It cannot be used with
        !          2600: .Fl + .
        !          2601: .It Fl f
        !          2602: Specifies
        !          2603: .Em fast scanner .
        !          2604: No table compression is done and stdio is bypassed.
        !          2605: The result is large but fast.
        !          2606: This option is equivalent to
        !          2607: .Fl Cfr
        !          2608: .Pq see above .
        !          2609: .It Fl h
        !          2610: Generates a help summary of
        !          2611: .Nm flex Ns 's
        !          2612: options to stdout and then exits.
        !          2613: .Fl ?\&
        !          2614: and
        !          2615: .Fl Fl help
        !          2616: are synonyms for
        !          2617: .Fl h .
        !          2618: .It Fl I
        !          2619: Instructs
        !          2620: .Nm
        !          2621: to generate an
        !          2622: .Em interactive
        !          2623: scanner.
        !          2624: An interactive scanner is one that only looks ahead to decide
        !          2625: what token has been matched if it absolutely must.
        !          2626: It turns out that always looking one extra character ahead,
        !          2627: even if the scanner has already seen enough text
        !          2628: to disambiguate the current token, is a bit faster than
        !          2629: only looking ahead when necessary.
        !          2630: But scanners that always look ahead give dreadful interactive performance;
        !          2631: for example, when a user types a newline,
        !          2632: it is not recognized as a newline token until they enter
        !          2633: .Em another
        !          2634: token, which often means typing in another whole line.
        !          2635: .Pp
        !          2636: .Nm
        !          2637: scanners default to
        !          2638: .Em interactive
        !          2639: unless
        !          2640: .Fl Cf
        !          2641: or
        !          2642: .Fl CF
        !          2643: table-compression options are specified
        !          2644: .Pq see above .
        !          2645: That's because if high-performance is most important,
        !          2646: one of these options should be used,
        !          2647: so if they weren't,
        !          2648: .Nm
        !          2649: assumes it is preferrable to trade off a bit of run-time performance for
        !          2650: intuitive interactive behavior.
        !          2651: Note also that
        !          2652: .Fl I
        !          2653: cannot be used in conjunction with
        !          2654: .Fl Cf
        !          2655: or
        !          2656: .Fl CF .
        !          2657: Thus, this option is not really needed; it is on by default for all those
        !          2658: cases in which it is allowed.
        !          2659: .Pp
        !          2660: A scanner can be forced to not be interactive by using
        !          2661: .Fl B
        !          2662: .Pq see above .
        !          2663: .It Fl i
        !          2664: Instructs
        !          2665: .Nm
        !          2666: to generate a case-insensitive scanner.
        !          2667: The case of letters given in the
        !          2668: .Nm
        !          2669: input patterns will be ignored,
        !          2670: and tokens in the input will be matched regardless of case.
        !          2671: The matched text given in
        !          2672: .Fa yytext
        !          2673: will have the preserved case
        !          2674: .Pq i.e., it will not be folded .
        !          2675: .It Fl L
        !          2676: Instructs
        !          2677: .Nm
        !          2678: not to generate
        !          2679: .Dq #line
        !          2680: directives.
        !          2681: Without this option,
        !          2682: .Nm
        !          2683: peppers the generated scanner with #line directives so error messages
        !          2684: in the actions will be correctly located with respect to either the original
        !          2685: .Nm
        !          2686: input file
        !          2687: (if the errors are due to code in the input file),
        !          2688: or
        !          2689: .Pa lex.yy.c
        !          2690: (if the errors are
        !          2691: .Nm flex Ns 's
        !          2692: fault \- these sorts of errors should be reported to the email address
        !          2693: given below).
        !          2694: .It Fl l
        !          2695: Turns on maximum compatibility with the original AT&T
        !          2696: .Nm lex
        !          2697: implementation.
        !          2698: Note that this does not mean full compatibility.
        !          2699: Use of this option costs a considerable amount of performance,
        !          2700: and it cannot be used with the
        !          2701: .Fl + , f , F , Cf ,
        !          2702: or
        !          2703: .Fl CF
        !          2704: options.
        !          2705: For details on the compatibilities it provides, see the section
        !          2706: .Sx INCOMPATIBILITIES WITH LEX AND POSIX
        !          2707: below.
        !          2708: This option also results in the name
        !          2709: .Dv YY_FLEX_LEX_COMPAT
        !          2710: being #define'd in the generated scanner.
        !          2711: .It Fl n
        !          2712: Another do-nothing, deprecated option included only for
        !          2713: .Tn POSIX
        !          2714: compliance.
        !          2715: .It Fl o Ns Ar output
        !          2716: Directs
        !          2717: .Nm
        !          2718: to write the scanner to the file
        !          2719: .Ar output
1.1       deraadt  2720: instead of
1.16    ! jmc      2721: .Pa lex.yy.c .
        !          2722: If
        !          2723: .Fl o
        !          2724: is combined with the
        !          2725: .Fl t
        !          2726: option, then the scanner is written to stdout but its
        !          2727: .Dq #line
        !          2728: directives
        !          2729: (see the
        !          2730: .Fl L
        !          2731: option above)
        !          2732: refer to the file
        !          2733: .Ar output .
        !          2734: .It Fl P Ns Ar prefix
        !          2735: Changes the default
        !          2736: .Qq yy
1.1       deraadt  2737: prefix used by
1.16    ! jmc      2738: .Nm
1.6       aaron    2739: for all globally visible variable and function names to instead be
1.16    ! jmc      2740: .Ar prefix .
1.1       deraadt  2741: For example,
1.16    ! jmc      2742: .Fl P Ns Ar foo
1.1       deraadt  2743: changes the name of
1.16    ! jmc      2744: .Fa yytext
1.1       deraadt  2745: to
1.16    ! jmc      2746: .Fa footext .
1.1       deraadt  2747: It also changes the name of the default output file from
1.16    ! jmc      2748: .Pa lex.yy.c
1.1       deraadt  2749: to
1.16    ! jmc      2750: .Pa lex.foo.c .
1.1       deraadt  2751: Here are all of the names affected:
1.16    ! jmc      2752: .Bd -unfilled -offset indent
        !          2753: yy_create_buffer
        !          2754: yy_delete_buffer
        !          2755: yy_flex_debug
        !          2756: yy_init_buffer
        !          2757: yy_flush_buffer
        !          2758: yy_load_buffer_state
        !          2759: yy_switch_to_buffer
        !          2760: yyin
        !          2761: yyleng
        !          2762: yylex
        !          2763: yylineno
        !          2764: yyout
        !          2765: yyrestart
        !          2766: yytext
        !          2767: yywrap
        !          2768: .Ed
        !          2769: .Pp
        !          2770: (If using a C++ scanner, then only
        !          2771: .Fa yywrap
1.1       deraadt  2772: and
1.16    ! jmc      2773: .Fa yyFlexLexer
1.1       deraadt  2774: are affected.)
1.16    ! jmc      2775: Within the scanner itself, it is still possible to refer to the global variables
1.1       deraadt  2776: and functions using either version of their name; but externally, they
                   2777: have the modified name.
1.16    ! jmc      2778: .Pp
        !          2779: This option allows multiple
        !          2780: .Nm
        !          2781: programs to be easily linked together into the same executable.
        !          2782: Note, though, that using this option also renames
        !          2783: .Fn yywrap ,
        !          2784: so now either an
        !          2785: .Pq appropriately named
        !          2786: version of the routine for the scanner must be supplied, or
        !          2787: .Dq %option noyywrap
        !          2788: must be used, as linking with
        !          2789: .Fl lfl
        !          2790: no longer provides one by default.
        !          2791: .It Fl p
        !          2792: Generates a performance report to stderr.
        !          2793: The report consists of comments regarding features of the
        !          2794: .Nm
        !          2795: input file which will cause a serious loss of performance in the resulting
        !          2796: scanner.
        !          2797: If the flag is specified twice,
        !          2798: comments regarding features that lead to minor performance losses
        !          2799: will also be reported>
        !          2800: .Pp
        !          2801: Note that the use of
        !          2802: .Em REJECT ,
        !          2803: .Dq %option yylineno ,
        !          2804: and variable trailing context
        !          2805: (see the
        !          2806: .Sx BUGS
        !          2807: section below)
        !          2808: entails a substantial performance penalty; use of
        !          2809: .Fn yymore ,
        !          2810: the
        !          2811: .Sq ^
        !          2812: operator, and the
        !          2813: .Fl I
        !          2814: flag entail minor performance penalties.
        !          2815: .It Fl S Ns Ar skeleton
        !          2816: Overrides the default skeleton file from which
        !          2817: .Nm
        !          2818: constructs its scanners.
        !          2819: This option is needed only for
        !          2820: .Nm
1.1       deraadt  2821: maintenance or development.
1.16    ! jmc      2822: .It Fl s
        !          2823: Causes the default rule
        !          2824: .Pq that unmatched scanner input is echoed to stdout
        !          2825: to be suppressed.
        !          2826: If the scanner encounters input that does not
        !          2827: match any of its rules, it aborts with an error.
        !          2828: This option is useful for finding holes in a scanner's rule set.
        !          2829: .It Fl T
        !          2830: Makes
        !          2831: .Nm
        !          2832: run in
        !          2833: .Em trace
        !          2834: mode.
        !          2835: It will generate a lot of messages to stderr concerning
        !          2836: the form of the input and the resultant non-deterministic and deterministic
        !          2837: finite automata.
        !          2838: This option is mostly for use in maintaining
        !          2839: .Nm .
        !          2840: .It Fl t
        !          2841: Instructs
        !          2842: .Nm
        !          2843: to write the scanner it generates to standard output instead of
        !          2844: .Pa lex.yy.c .
        !          2845: .It Fl V
        !          2846: Prints the version number to stdout and exits.
        !          2847: .Fl Fl version
        !          2848: is a synonym for
        !          2849: .Fl V .
        !          2850: .It Fl v
        !          2851: Specifies that
        !          2852: .Nm
        !          2853: should write to stderr
        !          2854: a summary of statistics regarding the scanner it generates.
        !          2855: Most of the statistics are meaningless to the casual
        !          2856: .Nm
        !          2857: user, but the first line identifies the version of
        !          2858: .Nm
        !          2859: (same as reported by
        !          2860: .Fl V ) ,
        !          2861: and the next line the flags used when generating the scanner,
        !          2862: including those that are on by default.
        !          2863: .It Fl w
        !          2864: Suppresses warning messages.
        !          2865: .It Fl +
        !          2866: Specifies that
        !          2867: .Nm
        !          2868: should generate a C++ scanner class.
        !          2869: See the section on
        !          2870: .Sx GENERATING C++ SCANNERS
        !          2871: below for details.
        !          2872: .El
        !          2873: .Pp
        !          2874: .Nm
1.1       deraadt  2875: also provides a mechanism for controlling options within the
1.16    ! jmc      2876: scanner specification itself, rather than from the
        !          2877: .Nm
        !          2878: command-line.
1.1       deraadt  2879: This is done by including
1.16    ! jmc      2880: .Dq %option
1.1       deraadt  2881: directives in the first section of the scanner specification.
1.16    ! jmc      2882: Multiple options can be specified with a single
        !          2883: .Dq %option
        !          2884: directive, and multiple directives in the first section of the
        !          2885: .Nm
        !          2886: input file.
        !          2887: .Pp
        !          2888: Most options are given simply as names, optionally preceded by the word
        !          2889: .Qq no
        !          2890: .Pq with no intervening whitespace
        !          2891: to negate their meaning.
        !          2892: A number are equivalent to
        !          2893: .Nm
        !          2894: flags or their negation:
        !          2895: .Bd -unfilled -offset indent
        !          2896: 7bit            -7 option
        !          2897: 8bit            -8 option
        !          2898: align           -Ca option
        !          2899: backup          -b option
        !          2900: batch           -B option
        !          2901: c++             -+ option
        !          2902:
        !          2903: caseful or
        !          2904: case-sensitive  opposite of -i (default)
        !          2905:
        !          2906: case-insensitive or
        !          2907: caseless        -i option
        !          2908:
        !          2909: debug           -d option
        !          2910: default         opposite of -s option
        !          2911: ecs             -Ce option
        !          2912: fast            -F option
        !          2913: full            -f option
        !          2914: interactive     -I option
        !          2915: lex-compat      -l option
        !          2916: meta-ecs        -Cm option
        !          2917: perf-report     -p option
        !          2918: read            -Cr option
        !          2919: stdout          -t option
        !          2920: verbose         -v option
        !          2921: warn            opposite of -w option
        !          2922:                 (use "%option nowarn" for -w)
        !          2923:
        !          2924: array           equivalent to "%array"
        !          2925: pointer         equivalent to "%pointer" (default)
        !          2926: .Ed
        !          2927: .Pp
        !          2928: Some %option's provide features otherwise not available:
        !          2929: .Bl -tag -width Ds
        !          2930: .It always-interactive
        !          2931: Instructs
        !          2932: .Nm
        !          2933: to generate a scanner which always considers its input
        !          2934: .Qq interactive .
        !          2935: Normally, on each new input file the scanner calls
        !          2936: .Fn isatty
        !          2937: in an attempt to determine whether the scanner's input source is interactive
        !          2938: and thus should be read a character at a time.
        !          2939: When this option is used, however, no such call is made.
        !          2940: .It main
        !          2941: Directs
        !          2942: .Nm
        !          2943: to provide a default
        !          2944: .Fn main
1.1       deraadt  2945: program for the scanner, which simply calls
1.16    ! jmc      2946: .Fn yylex .
1.1       deraadt  2947: This option implies
1.16    ! jmc      2948: .Dq noyywrap
        !          2949: .Pq see below .
        !          2950: .It never-interactive
        !          2951: Instructs
        !          2952: .Nm
        !          2953: to generate a scanner which never considers its input
        !          2954: .Qq interactive
        !          2955: (again, no call made to
        !          2956: .Fn isatty ) .
1.1       deraadt  2957: This is the opposite of
1.16    ! jmc      2958: .Dq always-interactive .
        !          2959: .It stack
        !          2960: Enables the use of start condition stacks
        !          2961: (see
        !          2962: .Sx START CONDITIONS
        !          2963: above).
        !          2964: .It stdinit
        !          2965: If set (i.e.,
        !          2966: .Dq %option stdinit ) ,
1.1       deraadt  2967: initializes
1.16    ! jmc      2968: .Fa yyin
1.1       deraadt  2969: and
1.16    ! jmc      2970: .Fa yyout
        !          2971: to stdin and stdout, instead of the default of
        !          2972: .Dq nil .
1.1       deraadt  2973: Some existing
1.16    ! jmc      2974: .Nm lex
        !          2975: programs depend on this behavior, even though it is not compliant with ANSI C,
        !          2976: which does not require stdin and stdout to be compile-time constant.
        !          2977: .It yylineno
        !          2978: Directs
        !          2979: .Nm
1.1       deraadt  2980: to generate a scanner that maintains the number of the current line
                   2981: read from its input in the global variable
1.16    ! jmc      2982: .Fa yylineno .
1.1       deraadt  2983: This option is implied by
1.16    ! jmc      2984: .Dq %option lex-compat .
        !          2985: .It yywrap
        !          2986: If unset (i.e.,
        !          2987: .Dq %option noyywrap ) ,
1.1       deraadt  2988: makes the scanner not call
1.16    ! jmc      2989: .Fn yywrap
        !          2990: upon an end-of-file, but simply assume that there are no more files to scan
        !          2991: (until the user points
        !          2992: .Fa yyin
1.1       deraadt  2993: at a new file and calls
1.16    ! jmc      2994: .Fn yylex
1.1       deraadt  2995: again).
1.16    ! jmc      2996: .El
        !          2997: .Pp
        !          2998: .Nm
        !          2999: scans rule actions to determine whether the
        !          3000: .Em REJECT
        !          3001: or
        !          3002: .Fn yymore
        !          3003: features are being used.
        !          3004: The
        !          3005: .Dq reject
1.1       deraadt  3006: and
1.16    ! jmc      3007: .Dq yymore
        !          3008: options are available to override its decision as to whether to use the
1.1       deraadt  3009: options, either by setting them (e.g.,
1.16    ! jmc      3010: .Dq %option reject )
        !          3011: to indicate the feature is indeed used,
        !          3012: or unsetting them to indicate it actually is not used
1.1       deraadt  3013: (e.g.,
1.16    ! jmc      3014: .Dq %option noyymore ) .
        !          3015: .Pp
        !          3016: Three options take string-delimited values, offset with
        !          3017: .Sq = :
        !          3018: .Pp
        !          3019: .D1 %option outfile="ABC"
        !          3020: .Pp
1.1       deraadt  3021: is equivalent to
1.16    ! jmc      3022: .Fl o Ns Ar ABC ,
1.1       deraadt  3023: and
1.16    ! jmc      3024: .Pp
        !          3025: .D1 %option prefix="XYZ"
        !          3026: .Pp
1.1       deraadt  3027: is equivalent to
1.16    ! jmc      3028: .Fl P Ns Ar XYZ .
1.1       deraadt  3029: Finally,
1.16    ! jmc      3030: .Pp
        !          3031: .D1 %option yyclass="foo"
        !          3032: .Pp
        !          3033: only applies when generating a C++ scanner
        !          3034: .Pf ( Fl +
        !          3035: option).
        !          3036: It informs
        !          3037: .Nm
        !          3038: that
        !          3039: .Dq foo
        !          3040: has been derived as a subclass of yyFlexLexer, so
        !          3041: .Nm
        !          3042: will place actions in the member function
        !          3043: .Dq foo::yylex()
1.1       deraadt  3044: instead of
1.16    ! jmc      3045: .Dq yyFlexLexer::yylex() .
1.1       deraadt  3046: It also generates a
1.16    ! jmc      3047: .Dq yyFlexLexer::yylex()
1.1       deraadt  3048: member function that emits a run-time error (by invoking
1.16    ! jmc      3049: .Dq yyFlexLexer::LexerError() )
1.1       deraadt  3050: if called.
1.16    ! jmc      3051: See
        !          3052: .Sx GENERATING C++ SCANNERS ,
        !          3053: below, for additional information.
        !          3054: .Pp
        !          3055: A number of options are available for
        !          3056: .Xr lint 1
        !          3057: purists who want to suppress the appearance of unneeded routines
        !          3058: in the generated scanner.
        !          3059: Each of the following, if unset
1.1       deraadt  3060: (e.g.,
1.16    ! jmc      3061: .Dq %option nounput ) ,
        !          3062: results in the corresponding routine not appearing in the generated scanner:
        !          3063: .Bd -unfilled -offset indent
        !          3064: input, unput
        !          3065: yy_push_state, yy_pop_state, yy_top_state
        !          3066: yy_scan_buffer, yy_scan_bytes, yy_scan_string
        !          3067: .Ed
        !          3068: .Pp
1.1       deraadt  3069: (though
1.16    ! jmc      3070: .Fn yy_push_state
        !          3071: and friends won't appear anyway unless
        !          3072: .Dq %option stack
        !          3073: is being used).
        !          3074: .Sh PERFORMANCE CONSIDERATIONS
1.1       deraadt  3075: The main design goal of
1.16    ! jmc      3076: .Nm
        !          3077: is that it generate high-performance scanners.
        !          3078: It has been optimized for dealing well with large sets of rules.
        !          3079: Aside from the effects on scanner speed of the table compression
        !          3080: .Fl C
1.1       deraadt  3081: options outlined above,
1.16    ! jmc      3082: there are a number of options/actions which degrade performance.
        !          3083: These are, from most expensive to least:
        !          3084: .Bd -unfilled -offset indent
        !          3085: REJECT
        !          3086: %option yylineno
        !          3087: arbitrary trailing context
        !          3088:
        !          3089: pattern sets that require backing up
        !          3090: %array
        !          3091: %option interactive
        !          3092: %option always-interactive
        !          3093:
        !          3094: \&'^' beginning-of-line operator
        !          3095: yymore()
        !          3096: .Ed
        !          3097: .Pp
        !          3098: with the first three all being quite expensive
        !          3099: and the last two being quite cheap.
        !          3100: Note also that
        !          3101: .Fn unput
        !          3102: is implemented as a routine call that potentially does quite a bit of work,
        !          3103: while
        !          3104: .Fn yyless
        !          3105: is a quite-cheap macro; so if just putting back some excess text,
        !          3106: use
        !          3107: .Fn yyless .
        !          3108: .Pp
        !          3109: .Em REJECT
1.1       deraadt  3110: should be avoided at all costs when performance is important.
                   3111: It is a particularly expensive option.
1.16    ! jmc      3112: .Pp
1.1       deraadt  3113: Getting rid of backing up is messy and often may be an enormous
1.16    ! jmc      3114: amount of work for a complicated scanner.
        !          3115: In principal, one begins by using the
        !          3116: .Fl b
1.1       deraadt  3117: flag to generate a
1.16    ! jmc      3118: .Pa lex.backup
        !          3119: file.
        !          3120: For example, on the input
        !          3121: .Bd -literal -offset indent
        !          3122: %%
        !          3123: foo        return TOK_KEYWORD;
        !          3124: foobar     return TOK_KEYWORD;
        !          3125: .Ed
        !          3126: .Pp
1.1       deraadt  3127: the file looks like:
1.16    ! jmc      3128: .Bd -literal -offset indent
        !          3129: State #6 is non-accepting -
        !          3130:  associated rule line numbers:
        !          3131:        2       3
        !          3132:  out-transitions: [ o ]
        !          3133:  jam-transitions: EOF [ \e001-n  p-\e177 ]
        !          3134:
        !          3135: State #8 is non-accepting -
        !          3136:  associated rule line numbers:
        !          3137:        3
        !          3138:  out-transitions: [ a ]
        !          3139:  jam-transitions: EOF [ \e001-`  b-\e177 ]
        !          3140:
        !          3141: State #9 is non-accepting -
        !          3142:  associated rule line numbers:
        !          3143:        3
        !          3144:  out-transitions: [ r ]
        !          3145:  jam-transitions: EOF [ \e001-q  s-\e177 ]
        !          3146:
        !          3147: Compressed tables always back up.
        !          3148: .Ed
        !          3149: .Pp
1.1       deraadt  3150: The first few lines tell us that there's a scanner state in
1.16    ! jmc      3151: which it can make a transition on an
        !          3152: .Sq o
        !          3153: but not on any other character,
        !          3154: and that in that state the currently scanned text does not match any rule.
        !          3155: The state occurs when trying to match the rules found
1.1       deraadt  3156: at lines 2 and 3 in the input file.
1.16    ! jmc      3157: If the scanner is in that state and then reads something other than an
        !          3158: .Sq o ,
        !          3159: it will have to back up to find a rule which is matched.
        !          3160: With a bit of headscratching one can see that this must be the
        !          3161: state it's in when it has seen
        !          3162: .Sq fo .
        !          3163: When this has happened, if anything other than another
        !          3164: .Sq o
        !          3165: is seen, the scanner will have to back up to simply match the
        !          3166: .Sq f
        !          3167: .Pq by the default rule .
        !          3168: .Pp
        !          3169: The comment regarding State #8 indicates there's a problem when
        !          3170: .Qq foob
        !          3171: has been scanned.
        !          3172: Indeed, on any character other than an
        !          3173: .Sq a ,
        !          3174: the scanner will have to back up to accept
        !          3175: .Qq foo .
        !          3176: Similarly, the comment for State #9 concerns when
        !          3177: .Qq fooba
        !          3178: has been scanned and an
        !          3179: .Sq r
        !          3180: does not follow.
        !          3181: .Pp
1.1       deraadt  3182: The final comment reminds us that there's no point going to
1.16    ! jmc      3183: all the trouble of removing backing up from the rules unless we're using
        !          3184: .Fl Cf
1.1       deraadt  3185: or
1.16    ! jmc      3186: .Fl CF ,
1.1       deraadt  3187: since there's no performance gain doing so with compressed scanners.
1.16    ! jmc      3188: .Pp
        !          3189: The way to remove the backing up is to add
        !          3190: .Qq error
        !          3191: rules:
        !          3192: .Bd -literal -offset indent
        !          3193: %%
        !          3194: foo    return TOK_KEYWORD;
        !          3195: foobar return TOK_KEYWORD;
        !          3196:
        !          3197: fooba  |
        !          3198: foob   |
        !          3199: fo {
        !          3200:         /* false alarm, not really a keyword */
        !          3201:         return TOK_ID;
        !          3202: }
        !          3203: .Ed
        !          3204: .Pp
        !          3205: Eliminating backing up among a list of keywords can also be done using a
        !          3206: .Qq catch-all
        !          3207: rule:
        !          3208: .Bd -literal -offset indent
        !          3209: %%
        !          3210: foo    return TOK_KEYWORD;
        !          3211: foobar return TOK_KEYWORD;
        !          3212:
        !          3213: [a-z]+ return TOK_ID;
        !          3214: .Ed
        !          3215: .Pp
1.1       deraadt  3216: This is usually the best solution when appropriate.
1.16    ! jmc      3217: .Pp
1.1       deraadt  3218: Backing up messages tend to cascade.
1.16    ! jmc      3219: With a complicated set of rules it's not uncommon to get hundreds of messages.
        !          3220: If one can decipher them, though,
        !          3221: it often only takes a dozen or so rules to eliminate the backing up
        !          3222: (though it's easy to make a mistake and have an error rule accidentally match
        !          3223: a valid token; a possible future
        !          3224: .Nm
1.1       deraadt  3225: feature will be to automatically add rules to eliminate backing up).
1.16    ! jmc      3226: .Pp
        !          3227: It's important to keep in mind that the benefits of eliminating
        !          3228: backing up are gained only if
        !          3229: .Em every
        !          3230: instance of backing up is eliminated.
        !          3231: Leaving just one gains nothing.
        !          3232: .Pp
        !          3233: .Em Variable
        !          3234: trailing context
        !          3235: (where both the leading and trailing parts do not have a fixed length)
        !          3236: entails almost the same performance loss as
        !          3237: .Em REJECT
        !          3238: .Pq i.e., substantial .
        !          3239: So when possible a rule like:
        !          3240: .Bd -literal -offset indent
        !          3241: %%
        !          3242: mouse|rat/(cat|dog)   run();
        !          3243: .Ed
        !          3244: .Pp
1.1       deraadt  3245: is better written:
1.16    ! jmc      3246: .Bd -literal -offset indent
        !          3247: %%
        !          3248: mouse/cat|dog         run();
        !          3249: rat/cat|dog           run();
        !          3250: .Ed
        !          3251: .Pp
1.1       deraadt  3252: or as
1.16    ! jmc      3253: .Bd -literal -offset indent
        !          3254: %%
        !          3255: mouse|rat/cat         run();
        !          3256: mouse|rat/dog         run();
        !          3257: .Ed
        !          3258: .Pp
        !          3259: Note that here the special
        !          3260: .Sq |\&
        !          3261: action does not provide any savings, and can even make things worse (see
        !          3262: .Sx BUGS
        !          3263: below).
        !          3264: .Pp
1.1       deraadt  3265: Another area where the user can increase a scanner's performance
1.16    ! jmc      3266: .Pq and one that's easier to implement
        !          3267: arises from the fact that the longer the tokens matched,
        !          3268: the faster the scanner will run.
1.1       deraadt  3269: This is because with long tokens the processing of most input
1.16    ! jmc      3270: characters takes place in the
        !          3271: .Pq short
        !          3272: inner scanning loop, and does not often have to go through the additional work
        !          3273: of setting up the scanning environment (e.g.,
        !          3274: .Fa yytext )
        !          3275: for the action.
        !          3276: Recall the scanner for C comments:
        !          3277: .Bd -literal -offset indent
        !          3278: %x comment
        !          3279: %%
        !          3280: int line_num = 1;
        !          3281:
        !          3282: "/*"                    BEGIN(comment);
        !          3283:
        !          3284: <comment>[^*\en]*
        !          3285: <comment>"*"+[^*/\en]*
        !          3286: <comment>\en             ++line_num;
        !          3287: <comment>"*"+"/"        BEGIN(INITIAL);
        !          3288: .Ed
        !          3289: .Pp
1.1       deraadt  3290: This could be sped up by writing it as:
1.16    ! jmc      3291: .Bd -literal -offset indent
        !          3292: %x comment
        !          3293: %%
        !          3294: int line_num = 1;
        !          3295:
        !          3296: "/*"                    BEGIN(comment);
        !          3297:
        !          3298: <comment>[^*\en]*
        !          3299: <comment>[^*\en]*\en      ++line_num;
        !          3300: <comment>"*"+[^*/\en]*
        !          3301: <comment>"*"+[^*/\en]*\en ++line_num;
        !          3302: <comment>"*"+"/"        BEGIN(INITIAL);
        !          3303: .Ed
        !          3304: .Pp
        !          3305: Now instead of each newline requiring the processing of another action,
        !          3306: recognizing the newlines is
        !          3307: .Qq distributed
        !          3308: over the other rules to keep the matched text as long as possible.
        !          3309: Note that adding rules does
        !          3310: .Em not
        !          3311: slow down the scanner!
        !          3312: The speed of the scanner is independent of the number of rules or
        !          3313: (modulo the considerations given at the beginning of this section)
        !          3314: how complicated the rules are with regard to operators such as
        !          3315: .Sq *
        !          3316: and
        !          3317: .Sq |\& .
        !          3318: .Pp
        !          3319: A final example in speeding up a scanner:
        !          3320: scan through a file containing identifiers and keywords, one per line
        !          3321: and with no other extraneous characters, and recognize all the keywords.
        !          3322: A natural first approach is:
        !          3323: .Bd -literal -offset indent
        !          3324: %%
        !          3325: asm      |
        !          3326: auto     |
        !          3327: break    |
        !          3328: \&... etc ...
        !          3329: volatile |
        !          3330: while    /* it's a keyword */
        !          3331:
        !          3332: \&.|\en     /* it's not a keyword */
        !          3333: .Ed
        !          3334: .Pp
1.1       deraadt  3335: To eliminate the back-tracking, introduce a catch-all rule:
1.16    ! jmc      3336: .Bd -literal -offset indent
        !          3337: %%
        !          3338: asm      |
        !          3339: auto     |
        !          3340: break    |
        !          3341: \&... etc ...
        !          3342: volatile |
        !          3343: while    /* it's a keyword */
        !          3344:
        !          3345: [a-z]+   |
        !          3346: \&.|\en     /* it's not a keyword */
        !          3347: .Ed
        !          3348: .Pp
1.1       deraadt  3349: Now, if it's guaranteed that there's exactly one word per line,
                   3350: then we can reduce the total number of matches by a half by
1.16    ! jmc      3351: merging in the recognition of newlines with that of the other tokens:
        !          3352: .Bd -literal -offset indent
        !          3353: %%
        !          3354: asm\en      |
        !          3355: auto\en     |
        !          3356: break\en    |
        !          3357: \&... etc ...
        !          3358: volatile\en |
        !          3359: while\en    /* it's a keyword */
        !          3360:
        !          3361: [a-z]+\en   |
        !          3362: \&.|\en       /* it's not a keyword */
        !          3363: .Ed
        !          3364: .Pp
        !          3365: One has to be careful here,
        !          3366: as we have now reintroduced backing up into the scanner.
        !          3367: In particular, while we know that there will never be any characters
        !          3368: in the input stream other than letters or newlines,
        !          3369: .Nm
1.1       deraadt  3370: can't figure this out, and it will plan for possibly needing to back up
1.16    ! jmc      3371: when it has scanned a token like
        !          3372: .Qq auto
        !          3373: and then the next character is something other than a newline or a letter.
        !          3374: Previously it would then just match the
        !          3375: .Qq auto
        !          3376: rule and be done, but now it has no
        !          3377: .Qq auto
        !          3378: rule, only an
        !          3379: .Qq auto\en
        !          3380: rule.
        !          3381: To eliminate the possibility of backing up,
1.1       deraadt  3382: we could either duplicate all rules but without final newlines, or,
                   3383: since we never expect to encounter such an input and therefore don't
1.16    ! jmc      3384: how it's classified, we can introduce one more catch-all rule,
        !          3385: this one which doesn't include a newline:
        !          3386: .Bd -literal -offset indent
        !          3387: %%
        !          3388: asm\en      |
        !          3389: auto\en     |
        !          3390: break\en    |
        !          3391: \&... etc ...
        !          3392: volatile\en |
        !          3393: while\en    /* it's a keyword */
        !          3394:
        !          3395: [a-z]+\en   |
        !          3396: [a-z]+     |
        !          3397: \&.|\en       /* it's not a keyword */
        !          3398: .Ed
        !          3399: .Pp
1.1       deraadt  3400: Compiled with
1.16    ! jmc      3401: .Fl Cf ,
1.1       deraadt  3402: this is about as fast as one can get a
1.16    ! jmc      3403: .Nm
1.1       deraadt  3404: scanner to go for this particular problem.
1.16    ! jmc      3405: .Pp
1.1       deraadt  3406: A final note:
1.16    ! jmc      3407: .Nm
        !          3408: is slow when matching NUL's,
        !          3409: particularly when a token contains multiple NUL's.
        !          3410: It's best to write rules which match short
1.1       deraadt  3411: amounts of text if it's anticipated that the text will often include NUL's.
1.16    ! jmc      3412: .Pp
1.1       deraadt  3413: Another final note regarding performance: as mentioned above in the section
1.16    ! jmc      3414: .Sx HOW THE INPUT IS MATCHED ,
        !          3415: dynamically resizing
        !          3416: .Fa yytext
1.1       deraadt  3417: to accommodate huge tokens is a slow process because it presently requires that
1.16    ! jmc      3418: the
        !          3419: .Pq huge
        !          3420: token be rescanned from the beginning.
        !          3421: Thus if performance is vital, it is better to attempt to match
        !          3422: .Qq large
        !          3423: quantities of text but not
        !          3424: .Qq huge
        !          3425: quantities, where the cutoff between the two is at about 8K characters/token.
        !          3426: .Sh GENERATING C++ SCANNERS
        !          3427: .Nm
        !          3428: provides two different ways to generate scanners for use with C++.
        !          3429: The first way is to simply compile a scanner generated by
        !          3430: .Nm
        !          3431: using a C++ compiler instead of a C compiler.
        !          3432: This should not generate any compilation errors
        !          3433: (please report any found to the email address given in the
        !          3434: .Sx AUTHORS
        !          3435: section below).
        !          3436: C++ code can then be used in rule actions instead of C code.
        !          3437: Note that the default input source for scanners remains
        !          3438: .Fa yyin ,
1.1       deraadt  3439: and default echoing is still done to
1.16    ! jmc      3440: .Fa yyout .
1.1       deraadt  3441: Both of these remain
1.16    ! jmc      3442: .Fa FILE *
        !          3443: variables and not C++ streams.
        !          3444: .Pp
        !          3445: .Nm
        !          3446: can also be used to generate a C++ scanner class, using the
        !          3447: .Fl +
1.1       deraadt  3448: option (or, equivalently,
1.16    ! jmc      3449: .Dq %option c++ ) ,
        !          3450: which is automatically specified if the name of the flex executable ends in a
        !          3451: .Sq + ,
        !          3452: such as
        !          3453: .Nm flex++ .
        !          3454: When using this option,
        !          3455: .Nm
        !          3456: defaults to generating the scanner to the file
        !          3457: .Pa lex.yy.cc
1.1       deraadt  3458: instead of
1.16    ! jmc      3459: .Pa lex.yy.c .
1.1       deraadt  3460: The generated scanner includes the header file
1.16    ! jmc      3461: .Aq Pa g++/FlexLexer.h ,
1.1       deraadt  3462: which defines the interface to two C++ classes.
1.16    ! jmc      3463: .Pp
1.1       deraadt  3464: The first class,
1.16    ! jmc      3465: .Em FlexLexer ,
        !          3466: provides an abstract base class defining the general scanner class interface.
        !          3467: It provides the following member functions:
        !          3468: .Bl -tag -width Ds
        !          3469: .It const char* YYText()
        !          3470: Returns the text of the most recently matched token, the equivalent of
        !          3471: .Fa yytext .
        !          3472: .It int YYLeng()
        !          3473: Returns the length of the most recently matched token, the equivalent of
        !          3474: .Fa yyleng .
        !          3475: .It int lineno() const
        !          3476: Returns the current input line number
1.1       deraadt  3477: (see
1.16    ! jmc      3478: .Dq %option yylineno ) ,
        !          3479: or 1 if
        !          3480: .Dq %option yylineno
1.1       deraadt  3481: was not used.
1.16    ! jmc      3482: .It void set_debug(int flag)
        !          3483: Sets the debugging flag for the scanner, equivalent to assigning to
        !          3484: .Fa yy_flex_debug
        !          3485: (see the
        !          3486: .Sx OPTIONS
        !          3487: section above).
        !          3488: Note that the scanner must be built using
        !          3489: .Dq %option debug
1.1       deraadt  3490: to include debugging information in it.
1.16    ! jmc      3491: .It int debug() const
        !          3492: Returns the current setting of the debugging flag.
        !          3493: .El
        !          3494: .Pp
1.1       deraadt  3495: Also provided are member functions equivalent to
1.16    ! jmc      3496: .Fn yy_switch_to_buffer ,
        !          3497: .Fn yy_create_buffer
1.1       deraadt  3498: (though the first argument is an
1.16    ! jmc      3499: .Fa istream*
1.1       deraadt  3500: object pointer and not a
1.16    ! jmc      3501: .Fa FILE* ) ,
        !          3502: .Fn yy_flush_buffer ,
        !          3503: .Fn yy_delete_buffer ,
1.1       deraadt  3504: and
1.16    ! jmc      3505: .Fn yyrestart
1.10      deraadt  3506: (again, the first argument is an
1.16    ! jmc      3507: .Fa istream*
1.1       deraadt  3508: object pointer).
1.16    ! jmc      3509: .Pp
1.1       deraadt  3510: The second class defined in
1.16    ! jmc      3511: .Aq Pa g++/FlexLexer.h
1.1       deraadt  3512: is
1.16    ! jmc      3513: .Fa yyFlexLexer ,
1.1       deraadt  3514: which is derived from
1.16    ! jmc      3515: .Fa FlexLexer .
1.1       deraadt  3516: It defines the following additional member functions:
1.16    ! jmc      3517: .Bl -tag -width Ds
        !          3518: .It "yyFlexLexer(istream* arg_yyin = 0, ostream* arg_yyout = 0)"
        !          3519: Constructs a
        !          3520: .Fa yyFlexLexer
        !          3521: object using the given streams for input and output.
        !          3522: If not specified, the streams default to
        !          3523: .Fa cin
1.1       deraadt  3524: and
1.16    ! jmc      3525: .Fa cout ,
1.1       deraadt  3526: respectively.
1.16    ! jmc      3527: .It virtual int yylex()
        !          3528: Performs the same role as
        !          3529: .Fn yylex
1.1       deraadt  3530: does for ordinary flex scanners: it scans the input stream, consuming
1.16    ! jmc      3531: tokens, until a rule's action returns a value.
        !          3532: If subclass
        !          3533: .Sq S
        !          3534: is derived from
        !          3535: .Fa yyFlexLexer ,
        !          3536: in order to access the member functions and variables of
        !          3537: .Sq S
1.1       deraadt  3538: inside
1.16    ! jmc      3539: .Fn yylex ,
        !          3540: use
        !          3541: .Dq %option yyclass="S"
1.1       deraadt  3542: to inform
1.16    ! jmc      3543: .Nm
        !          3544: that the
        !          3545: .Sq S
        !          3546: subclass will be used instead of
        !          3547: .Fa yyFlexLexer .
1.1       deraadt  3548: In this case, rather than generating
1.16    ! jmc      3549: .Dq yyFlexLexer::yylex() ,
        !          3550: .Nm
1.1       deraadt  3551: generates
1.16    ! jmc      3552: .Dq S::yylex()
1.1       deraadt  3553: (and also generates a dummy
1.16    ! jmc      3554: .Dq yyFlexLexer::yylex()
1.1       deraadt  3555: that calls
1.16    ! jmc      3556: .Dq yyFlexLexer::LexerError()
1.1       deraadt  3557: if called).
1.16    ! jmc      3558: .It "virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)"
        !          3559: Reassigns
        !          3560: .Fa yyin
1.1       deraadt  3561: to
1.16    ! jmc      3562: .Fa new_in
        !          3563: .Pq if non-nil
1.1       deraadt  3564: and
1.16    ! jmc      3565: .Fa yyout
1.1       deraadt  3566: to
1.16    ! jmc      3567: .Fa new_out
        !          3568: .Pq ditto ,
        !          3569: deleting the previous input buffer if
        !          3570: .Fa yyin
1.1       deraadt  3571: is reassigned.
1.16    ! jmc      3572: .It int yylex(istream* new_in, ostream* new_out = 0)
        !          3573: First switches the input streams via
        !          3574: .Dq switch_streams(new_in, new_out)
1.1       deraadt  3575: and then returns the value of
1.16    ! jmc      3576: .Fn yylex .
        !          3577: .El
        !          3578: .Pp
1.1       deraadt  3579: In addition,
1.16    ! jmc      3580: .Fa yyFlexLexer
        !          3581: defines the following protected virtual functions which can be redefined
1.1       deraadt  3582: in derived classes to tailor the scanner:
1.16    ! jmc      3583: .Bl -tag -width Ds
        !          3584: .It virtual int LexerInput(char* buf, int max_size)
        !          3585: Reads up to
        !          3586: .Fa max_size
1.1       deraadt  3587: characters into
1.16    ! jmc      3588: .Fa buf
        !          3589: and returns the number of characters read.
        !          3590: To indicate end-of-input, return 0 characters.
        !          3591: Note that
        !          3592: .Qq interactive
        !          3593: scanners (see the
        !          3594: .Fl B
1.1       deraadt  3595: and
1.16    ! jmc      3596: .Fl I
1.1       deraadt  3597: flags) define the macro
1.16    ! jmc      3598: .Dv YY_INTERACTIVE .
        !          3599: If
        !          3600: .Fn LexerInput
        !          3601: has been redefined, and it's necessary to take different actions depending on
        !          3602: whether or not the scanner might be scanning an interactive input source,
        !          3603: it's possible to test for the presence of this name via
        !          3604: .Dq #ifdef .
        !          3605: .It virtual void LexerOutput(const char* buf, int size)
        !          3606: Writes out
        !          3607: .Fa size
1.1       deraadt  3608: characters from the buffer
1.16    ! jmc      3609: .Fa buf ,
        !          3610: which, while NUL-terminated, may also contain
        !          3611: .Qq internal
        !          3612: NUL's if the scanner's rules can match text with NUL's in them.
        !          3613: .It virtual void LexerError(const char* msg)
        !          3614: Reports a fatal error message.
        !          3615: The default version of this function writes the message to the stream
        !          3616: .Fa cerr
1.1       deraadt  3617: and exits.
1.16    ! jmc      3618: .El
        !          3619: .Pp
1.1       deraadt  3620: Note that a
1.16    ! jmc      3621: .Fa yyFlexLexer
        !          3622: object contains its entire scanning state.
        !          3623: Thus such objects can be used to create reentrant scanners.
        !          3624: Multiple instances of the same
        !          3625: .Fa yyFlexLexer
        !          3626: class can be instantiated, and multiple C++ scanner classes can be combined
1.1       deraadt  3627: in the same program using the
1.16    ! jmc      3628: .Fl P
1.1       deraadt  3629: option discussed above.
1.16    ! jmc      3630: .Pp
1.1       deraadt  3631: Finally, note that the
1.16    ! jmc      3632: .Dq %array
        !          3633: feature is not available to C++ scanner classes;
        !          3634: .Dq %pointer
        !          3635: must be used
        !          3636: .Pq the default .
        !          3637: .Pp
1.1       deraadt  3638: Here is an example of a simple C++ scanner:
1.16    ! jmc      3639: .Bd -literal -offset indent
        !          3640: // An example of using the flex C++ scanner class.
1.1       deraadt  3641:
1.16    ! jmc      3642: %{
        !          3643: #include <errno.h>
        !          3644: int mylineno = 0;
        !          3645: %}
1.1       deraadt  3646:
1.16    ! jmc      3647: string  \e"[^\en"]+\e"
1.1       deraadt  3648:
1.16    ! jmc      3649: ws      [ \et]+
1.1       deraadt  3650:
1.16    ! jmc      3651: alpha   [A-Za-z]
        !          3652: dig     [0-9]
        !          3653: name    ({alpha}|{dig}|\e$)({alpha}|{dig}|[_.\e-/$])*
        !          3654: num1    [-+]?{dig}+\e.?([eE][-+]?{dig}+)?
        !          3655: num2    [-+]?{dig}*\e.{dig}+([eE][-+]?{dig}+)?
        !          3656: number  {num1}|{num2}
1.1       deraadt  3657:
1.16    ! jmc      3658: %%
1.1       deraadt  3659:
1.16    ! jmc      3660: {ws}    /* skip blanks and tabs */
1.1       deraadt  3661:
1.16    ! jmc      3662: "/*" {
        !          3663:         int c;
1.1       deraadt  3664:
1.16    ! jmc      3665:         while ((c = yyinput()) != 0) {
        !          3666:                 if(c == '\en')
1.1       deraadt  3667:                     ++mylineno;
1.16    ! jmc      3668:                 else if(c == '*') {
        !          3669:                     if ((c = yyinput()) == '/')
1.1       deraadt  3670:                         break;
                   3671:                     else
                   3672:                         unput(c);
                   3673:                 }
1.16    ! jmc      3674:         }
        !          3675: }
1.1       deraadt  3676:
1.16    ! jmc      3677: {number}  cout << "number " << YYText() << '\en';
1.1       deraadt  3678:
1.16    ! jmc      3679: \en        mylineno++;
1.1       deraadt  3680:
1.16    ! jmc      3681: {name}    cout << "name " << YYText() << '\en';
1.1       deraadt  3682:
1.16    ! jmc      3683: {string}  cout << "string " << YYText() << '\en';
        !          3684:
        !          3685: %%
        !          3686:
        !          3687: int main(int /* argc */, char** /* argv */)
        !          3688: {
        !          3689:        FlexLexer* lexer = new yyFlexLexer;
        !          3690:        while(lexer->yylex() != 0)
        !          3691:            ;
        !          3692:        return 0;
        !          3693: }
        !          3694: .Ed
        !          3695: .Pp
        !          3696: To create multiple
        !          3697: .Pq different
        !          3698: lexer classes, use the
        !          3699: .Fl P
        !          3700: flag
        !          3701: (or the
        !          3702: .Dq prefix=
        !          3703: option)
        !          3704: to rename each
        !          3705: .Fa yyFlexLexer
1.1       deraadt  3706: to some other
1.16    ! jmc      3707: .Fa xxFlexLexer .
        !          3708: .Aq Pa g++/FlexLexer.h
        !          3709: can then be included in other sources once per lexer class, first renaming
        !          3710: .Fa yyFlexLexer
1.1       deraadt  3711: as follows:
1.16    ! jmc      3712: .Bd -literal -offset indent
        !          3713: #undef yyFlexLexer
        !          3714: #define yyFlexLexer xxFlexLexer
        !          3715: #include <g++/FlexLexer.h>
        !          3716:
        !          3717: #undef yyFlexLexer
        !          3718: #define yyFlexLexer zzFlexLexer
        !          3719: #include <g++/FlexLexer.h>
        !          3720: .Ed
        !          3721: .Pp
        !          3722: If, for example,
        !          3723: .Dq %option prefix="xx"
        !          3724: is used for one scanner and
        !          3725: .Dq %option prefix="zz"
        !          3726: is used for the other.
        !          3727: .Pp
        !          3728: .Sy IMPORTANT :
        !          3729: the present form of the scanning class is experimental
1.7       aaron    3730: and may change considerably between major releases.
1.16    ! jmc      3731: .Sh INCOMPATIBILITIES WITH LEX AND POSIX
        !          3732: .Nm
1.1       deraadt  3733: is a rewrite of the AT&T Unix
1.16    ! jmc      3734: .Nm lex
        !          3735: tool
        !          3736: (the two implementations do not share any code, though),
        !          3737: with some extensions and incompatibilities, both of which are of concern
        !          3738: to those who wish to write scanners acceptable to either implementation.
        !          3739: .Nm
        !          3740: is fully compliant with the
        !          3741: .Tn POSIX
        !          3742: .Nm lex
1.1       deraadt  3743: specification, except that when using
1.16    ! jmc      3744: .Dq %pointer
        !          3745: .Pq the default ,
        !          3746: a call to
        !          3747: .Fn unput
1.1       deraadt  3748: destroys the contents of
1.16    ! jmc      3749: .Fa yytext ,
        !          3750: which is counter to the
        !          3751: .Tn POSIX
        !          3752: specification.
        !          3753: .Pp
        !          3754: In this section we discuss all of the known areas of incompatibility between
        !          3755: .Nm ,
        !          3756: AT&T
        !          3757: .Nm lex ,
        !          3758: and the
        !          3759: .Tn POSIX
        !          3760: specification.
        !          3761: .Pp
        !          3762: .Nm flex Ns 's
        !          3763: .Fl l
1.1       deraadt  3764: option turns on maximum compatibility with the original AT&T
1.16    ! jmc      3765: .Nm lex
1.1       deraadt  3766: implementation, at the cost of a major loss in the generated scanner's
1.16    ! jmc      3767: performance.
        !          3768: We note below which incompatibilities can be overcome using the
        !          3769: .Fl l
1.1       deraadt  3770: option.
1.16    ! jmc      3771: .Pp
        !          3772: .Nm
1.1       deraadt  3773: is fully compatible with
1.16    ! jmc      3774: .Nm lex
1.1       deraadt  3775: with the following exceptions:
1.16    ! jmc      3776: .Bl -dash
        !          3777: .It
1.1       deraadt  3778: The undocumented
1.16    ! jmc      3779: .Nm lex
1.1       deraadt  3780: scanner internal variable
1.16    ! jmc      3781: .Fa yylineno
1.1       deraadt  3782: is not supported unless
1.16    ! jmc      3783: .Fl l
1.1       deraadt  3784: or
1.16    ! jmc      3785: .Dq %option yylineno
1.1       deraadt  3786: is used.
1.16    ! jmc      3787: .Pp
        !          3788: .Fa yylineno
1.1       deraadt  3789: should be maintained on a per-buffer basis, rather than a per-scanner
1.16    ! jmc      3790: .Pq single global variable
        !          3791: basis.
        !          3792: .Pp
        !          3793: .Fa yylineno
        !          3794: is not part of the
        !          3795: .Tn POSIX
        !          3796: specification.
        !          3797: .It
1.1       deraadt  3798: The
1.16    ! jmc      3799: .Fn input
1.1       deraadt  3800: routine is not redefinable, though it may be called to read characters
1.16    ! jmc      3801: following whatever has been matched by a rule.
        !          3802: If
        !          3803: .Fn input
        !          3804: encounters an end-of-file, the normal
        !          3805: .Fn yywrap
        !          3806: processing is done.
        !          3807: A
        !          3808: .Dq real
        !          3809: end-of-file is returned by
        !          3810: .Fn input
1.1       deraadt  3811: as
1.16    ! jmc      3812: .Dv EOF .
        !          3813: .Pp
1.1       deraadt  3814: Input is instead controlled by defining the
1.16    ! jmc      3815: .Dv YY_INPUT
1.1       deraadt  3816: macro.
1.16    ! jmc      3817: .Pp
1.1       deraadt  3818: The
1.16    ! jmc      3819: .Nm
1.1       deraadt  3820: restriction that
1.16    ! jmc      3821: .Fn input
        !          3822: cannot be redefined is in accordance with the
        !          3823: .Tn POSIX
        !          3824: specification, which simply does not specify any way of controlling the
1.1       deraadt  3825: scanner's input other than by making an initial assignment to
1.16    ! jmc      3826: .Fa yyin .
        !          3827: .It
1.1       deraadt  3828: The
1.16    ! jmc      3829: .Fn unput
        !          3830: routine is not redefinable.
        !          3831: This restriction is in accordance with
        !          3832: .Tn POSIX .
        !          3833: .It
        !          3834: .Nm
1.1       deraadt  3835: scanners are not as reentrant as
1.16    ! jmc      3836: .Nm lex
        !          3837: scanners.
        !          3838: In particular, if a scanner is interactive and
        !          3839: an interrupt handler long-jumps out of the scanner,
        !          3840: and the scanner is subsequently called again,
        !          3841: the following error message may be displayed:
        !          3842: .Pp
        !          3843: .D1 fatal flex scanner internal error--end of buffer missed
        !          3844: .Pp
1.1       deraadt  3845: To reenter the scanner, first use
1.16    ! jmc      3846: .Pp
        !          3847: .Dl yyrestart(yyin);
        !          3848: .Pp
        !          3849: Note that this call will throw away any buffered input;
        !          3850: usually this isn't a problem with an interactive scanner.
        !          3851: .Pp
        !          3852: Also note that flex C++ scanner classes are reentrant,
        !          3853: so if using C++ is an option , they should be used instead.
        !          3854: See
        !          3855: .Sx GENERATING C++ SCANNERS
        !          3856: above for details.
        !          3857: .It
        !          3858: .Fn output
1.1       deraadt  3859: is not supported.
                   3860: Output from the
1.16    ! jmc      3861: .Em ECHO
1.1       deraadt  3862: macro is done to the file-pointer
1.16    ! jmc      3863: .Fa yyout
        !          3864: .Pq default stdout .
        !          3865: .Pp
        !          3866: .Fn output
        !          3867: is not part of the
        !          3868: .Tn POSIX
        !          3869: specification.
        !          3870: .It
        !          3871: .Nm lex
        !          3872: does not support exclusive start conditions
        !          3873: .Pq %x ,
        !          3874: though they are in the
        !          3875: .Tn POSIX
        !          3876: specification.
        !          3877: .It
1.1       deraadt  3878: When definitions are expanded,
1.16    ! jmc      3879: .Nm
1.1       deraadt  3880: encloses them in parentheses.
1.16    ! jmc      3881: With
        !          3882: .Nm lex ,
        !          3883: the following:
        !          3884: .Bd -literal -offset indent
        !          3885: NAME    [A-Z][A-Z0-9]*
        !          3886: %%
        !          3887: foo{NAME}?      printf("Found it\en");
        !          3888: %%
        !          3889: .Ed
        !          3890: .Pp
        !          3891: will not match the string
        !          3892: .Qq foo
        !          3893: because when the macro is expanded the rule is equivalent to
        !          3894: .Qq foo[A-Z][A-Z0-9]*?
        !          3895: and the precedence is such that the
        !          3896: .Sq ?\&
        !          3897: is associated with
        !          3898: .Qq [A-Z0-9]* .
        !          3899: With
        !          3900: .Nm ,
1.1       deraadt  3901: the rule will be expanded to
1.16    ! jmc      3902: .Qq foo([A-Z][A-Z0-9]*)?
        !          3903: and so the string
        !          3904: .Qq foo
        !          3905: will match.
        !          3906: .Pp
1.1       deraadt  3907: Note that if the definition begins with
1.16    ! jmc      3908: .Sq ^
1.1       deraadt  3909: or ends with
1.16    ! jmc      3910: .Sq $
        !          3911: then it is not expanded with parentheses, to allow these operators to appear in
        !          3912: definitions without losing their special meanings.
        !          3913: But the
        !          3914: .Sq Aq s ,
        !          3915: .Sq / ,
1.1       deraadt  3916: and
1.16    ! jmc      3917: .Aq Aq EOF
1.1       deraadt  3918: operators cannot be used in a
1.16    ! jmc      3919: .Nm
1.1       deraadt  3920: definition.
1.16    ! jmc      3921: .Pp
1.1       deraadt  3922: Using
1.16    ! jmc      3923: .Fl l
1.1       deraadt  3924: results in the
1.16    ! jmc      3925: .Nm lex
1.1       deraadt  3926: behavior of no parentheses around the definition.
1.16    ! jmc      3927: .Pp
        !          3928: The
        !          3929: .Tn POSIX
        !          3930: specification is that the definition be enclosed in parentheses.
        !          3931: .It
1.1       deraadt  3932: Some implementations of
1.16    ! jmc      3933: .Nm lex
        !          3934: allow a rule's action to begin on a separate line,
        !          3935: if the rule's pattern has trailing whitespace:
        !          3936: .Bd -literal -offset indent
        !          3937: %%
        !          3938: foo|bar<space here>
        !          3939:   { foobar_action(); }
        !          3940: .Ed
        !          3941: .Pp
        !          3942: .Nm
1.1       deraadt  3943: does not support this feature.
1.16    ! jmc      3944: .It
1.1       deraadt  3945: The
1.16    ! jmc      3946: .Nm lex
        !          3947: .Sq %r
        !          3948: .Pq generate a Ratfor scanner
        !          3949: option is not supported.
        !          3950: It is not part of the
        !          3951: .Tn POSIX
        !          3952: specification.
        !          3953: .It
1.1       deraadt  3954: After a call to
1.16    ! jmc      3955: .Fn unput ,
        !          3956: .Fa yytext
        !          3957: is undefined until the next token is matched,
        !          3958: unless the scanner was built using
        !          3959: .Dq %array .
1.1       deraadt  3960: This is not the case with
1.16    ! jmc      3961: .Nm lex
        !          3962: or the
        !          3963: .Tn POSIX
        !          3964: specification.
        !          3965: The
        !          3966: .Fl l
1.1       deraadt  3967: option does away with this incompatibility.
1.16    ! jmc      3968: .It
1.1       deraadt  3969: The precedence of the
1.16    ! jmc      3970: .Sq {}
        !          3971: .Pq numeric range
        !          3972: operator is different.
        !          3973: .Nm lex
        !          3974: interprets
        !          3975: .Qq abc{1,3}
        !          3976: as match one, two, or three occurrences of
        !          3977: .Sq abc ,
        !          3978: whereas
        !          3979: .Nm
        !          3980: interprets it as match
        !          3981: .Sq ab
        !          3982: followed by one, two, or three occurrences of
        !          3983: .Sq c .
        !          3984: The latter is in agreement with the
        !          3985: .Tn POSIX
        !          3986: specification.
        !          3987: .It
1.1       deraadt  3988: The precedence of the
1.16    ! jmc      3989: .Sq ^
1.1       deraadt  3990: operator is different.
1.16    ! jmc      3991: .Nm lex
        !          3992: interprets
        !          3993: .Qq ^foo|bar
        !          3994: as match either
        !          3995: .Sq foo
        !          3996: at the beginning of a line, or
        !          3997: .Sq bar
        !          3998: anywhere, whereas
        !          3999: .Nm
        !          4000: interprets it as match either
        !          4001: .Sq foo
        !          4002: or
        !          4003: .Sq bar
        !          4004: if they come at the beginning of a line.
        !          4005: The latter is in agreement with the
        !          4006: .Tn POSIX
        !          4007: specification.
        !          4008: .It
1.1       deraadt  4009: The special table-size declarations such as
1.16    ! jmc      4010: .Sq %a
1.1       deraadt  4011: supported by
1.16    ! jmc      4012: .Nm lex
1.1       deraadt  4013: are not required by
1.16    ! jmc      4014: .Nm
1.1       deraadt  4015: scanners;
1.16    ! jmc      4016: .Nm
1.1       deraadt  4017: ignores them.
1.16    ! jmc      4018: .It
1.1       deraadt  4019: The name
1.16    ! jmc      4020: .Dv FLEX_SCANNER
1.1       deraadt  4021: is #define'd so scanners may be written for use with either
1.16    ! jmc      4022: .Nm
1.1       deraadt  4023: or
1.16    ! jmc      4024: .Nm lex .
1.1       deraadt  4025: Scanners also include
1.16    ! jmc      4026: .Dv YY_FLEX_MAJOR_VERSION
1.1       deraadt  4027: and
1.16    ! jmc      4028: .Dv YY_FLEX_MINOR_VERSION
1.1       deraadt  4029: indicating which version of
1.16    ! jmc      4030: .Nm
1.1       deraadt  4031: generated the scanner
1.16    ! jmc      4032: (for example, for the 2.5 release, these defines would be 2 and 5,
1.1       deraadt  4033: respectively).
1.16    ! jmc      4034: .El
        !          4035: .Pp
1.1       deraadt  4036: The following
1.16    ! jmc      4037: .Nm
1.1       deraadt  4038: features are not included in
1.16    ! jmc      4039: .Nm lex
        !          4040: or the
        !          4041: .Tn POSIX
        !          4042: specification:
        !          4043: .Bd -unfilled -offset indent
        !          4044: C++ scanners
        !          4045: %option
        !          4046: start condition scopes
        !          4047: start condition stacks
        !          4048: interactive/non-interactive scanners
        !          4049: yy_scan_string() and friends
        !          4050: yyterminate()
        !          4051: yy_set_interactive()
        !          4052: yy_set_bol()
        !          4053: YY_AT_BOL()
        !          4054: <<EOF>>
        !          4055: <*>
        !          4056: YY_DECL
        !          4057: YY_START
        !          4058: YY_USER_ACTION
        !          4059: YY_USER_INIT
        !          4060: #line directives
        !          4061: %{}'s around actions
        !          4062: multiple actions on a line
        !          4063: .Ed
        !          4064: .Pp
        !          4065: plus almost all of the
        !          4066: .Nm
        !          4067: flags.
1.1       deraadt  4068: The last feature in the list refers to the fact that with
1.16    ! jmc      4069: .Nm
        !          4070: Multiple actions ican be placed on the same line,
        !          4071: separated with semi-colons, while with
        !          4072: .Nm lex ,
1.1       deraadt  4073: the following
1.16    ! jmc      4074: .Pp
        !          4075: .Dl foo    handle_foo(); ++num_foos_seen;
        !          4076: .Pp
        !          4077: is
        !          4078: .Pq rather surprisingly
        !          4079: truncated to
        !          4080: .Pp
        !          4081: .Dl foo    handle_foo();
        !          4082: .Pp
        !          4083: .Nm
        !          4084: does not truncate the action.
        !          4085: Actions that are not enclosed in braces
        !          4086: are simply terminated at the end of the line.
        !          4087: .Sh FILES
        !          4088: .Bl -tag -width "<g++/FlexLexer.h>"
        !          4089: .It flex.skl
        !          4090: Skeleton scanner.
        !          4091: This file is only used when building flex, not when
        !          4092: .Nm
        !          4093: executes.
        !          4094: .It lex.backup
        !          4095: Backing-up information for the
        !          4096: .Fl b
        !          4097: flag (called
        !          4098: .Pa lex.bck
        !          4099: on some systems).
        !          4100: .It lex.yy.c
        !          4101: Generated scanner
        !          4102: (called
        !          4103: .Pa lexyy.c
        !          4104: on some systems).
        !          4105: .It lex.yy.cc
        !          4106: Generated C++ scanner class, when using
        !          4107: .Fl + .
        !          4108: .It Aq g++/FlexLexer.h
        !          4109: Header file defining the C++ scanner base class,
        !          4110: .Fa FlexLexer ,
        !          4111: and its derived class,
        !          4112: .Fa yyFlexLexer .
        !          4113: .It /usr/lib/libl.*
        !          4114: .Nm
        !          4115: libraries.
        !          4116: The
        !          4117: .Pa /usr/lib/libfl.*\&
        !          4118: libraries are links to these.
        !          4119: Scanners must be linked using either
        !          4120: .Fl \&ll
        !          4121: or
        !          4122: .Fl lfl .
        !          4123: .El
        !          4124: .Sh DIAGNOSTICS
        !          4125: .Bl -diag
        !          4126: .It warning, rule cannot be matched
        !          4127: Indicates that the given rule cannot be matched because it follows other rules
        !          4128: that will always match the same text as it.
        !          4129: For example, in the following
        !          4130: .Dq foo
        !          4131: cannot be matched because it comes after an identifier
        !          4132: .Qq catch-all
        !          4133: rule:
        !          4134: .Bd -literal -offset indent
        !          4135: [a-z]+    got_identifier();
        !          4136: foo       got_foo();
        !          4137: .Ed
        !          4138: .Pp
1.1       deraadt  4139: Using
1.16    ! jmc      4140: .Em REJECT
1.1       deraadt  4141: in a scanner suppresses this warning.
1.16    ! jmc      4142: .It "warning, \-s option given but default rule can be matched"
        !          4143: Means that it is possible
        !          4144: .Pq perhaps only in a particular start condition
        !          4145: that the default rule
        !          4146: .Pq match any single character
        !          4147: is the only one that will match a particular input.
        !          4148: Since
        !          4149: .Fl s
1.1       deraadt  4150: was given, presumably this is not intended.
1.16    ! jmc      4151: .It reject_used_but_not_detected undefined
        !          4152: .It yymore_used_but_not_detected undefined
        !          4153: These errors can occur at compile time.
        !          4154: They indicate that the scanner uses
        !          4155: .Em REJECT
1.1       deraadt  4156: or
1.16    ! jmc      4157: .Fn yymore
1.1       deraadt  4158: but that
1.16    ! jmc      4159: .Nm
1.1       deraadt  4160: failed to notice the fact, meaning that
1.16    ! jmc      4161: .Nm
1.1       deraadt  4162: scanned the first two sections looking for occurrences of these actions
1.16    ! jmc      4163: and failed to find any, but somehow they snuck in
        !          4164: .Pq via an #include file, for example .
        !          4165: Use
        !          4166: .Dq %option reject
        !          4167: or
        !          4168: .Dq %option yymore
        !          4169: to indicate to
        !          4170: .Nm
        !          4171: that these features are really needed.
        !          4172: .It flex scanner jammed
        !          4173: A scanner compiled with
        !          4174: .Fl s
        !          4175: has encountered an input string which wasn't matched by any of its rules.
        !          4176: This error can also occur due to internal problems.
        !          4177: .It token too large, exceeds YYLMAX
        !          4178: The scanner uses
        !          4179: .Dq %array
1.1       deraadt  4180: and one of its rules matched a string longer than the
1.16    ! jmc      4181: .Dv YYLMAX
        !          4182: constant
        !          4183: .Pq 8K bytes by default .
        !          4184: The value can be increased by #define'ing
        !          4185: .Dv YYLMAX
        !          4186: in the definitions section of
        !          4187: .Nm
1.1       deraadt  4188: input.
1.16    ! jmc      4189: .It "scanner requires \-8 flag to use the character 'x'"
        !          4190: The scanner specification includes recognizing the 8-bit character
        !          4191: .Sq x
        !          4192: and the
        !          4193: .Fl 8
        !          4194: flag was not specified, and defaulted to 7-bit because the
        !          4195: .Fl Cf
        !          4196: or
        !          4197: .Fl CF
        !          4198: table compression options were used.
        !          4199: See the discussion of the
        !          4200: .Fl 7
1.1       deraadt  4201: flag for details.
1.16    ! jmc      4202: .It flex scanner push-back overflow
        !          4203: unput() was used to push back so much text that the scanner's buffer
        !          4204: could not hold both the pushed-back text and the current token in
        !          4205: .Fa yytext .
        !          4206: Ideally the scanner should dynamically resize the buffer in this case,
        !          4207: but at present it does not.
        !          4208: .It "input buffer overflow, can't enlarge buffer because scanner uses REJECT"
        !          4209: The scanner was working on matching an extremely large token and needed
        !          4210: to expand the input buffer.
        !          4211: This doesn't work with scanners that use
        !          4212: .Em REJECT .
        !          4213: .It "fatal flex scanner internal error--end of buffer missed"
1.1       deraadt  4214: This can occur in an scanner which is reentered after a long-jump
1.16    ! jmc      4215: has jumped out
        !          4216: .Pq or over
        !          4217: the scanner's activation frame.
        !          4218: Before reentering the scanner, use:
        !          4219: .Pp
        !          4220: .Dl yyrestart(yyin);
        !          4221: .Pp
1.1       deraadt  4222: or, as noted above, switch to using the C++ scanner class.
1.16    ! jmc      4223: .It "too many start conditions in <> construct!"
        !          4224: More start conditions than exist were listed in a <> construct
        !          4225: (so at least one of them must have been listed twice).
        !          4226: .El
        !          4227: .Sh SEE ALSO
        !          4228: .Xr awk 1 ,
        !          4229: .Xr lex 1 ,
        !          4230: .Xr sed 1 ,
        !          4231: .Xr yacc 1
        !          4232: .Pp
        !          4233: .Rs
        !          4234: .%A John Levine
        !          4235: .%A Tony Mason
        !          4236: .%A Doug Brown
        !          4237: .%B Lex & Yacc
        !          4238: .%I O'Reilly and Associates
        !          4239: .%N 2nd edition
        !          4240: .Re
        !          4241: .Rs
        !          4242: .%A M. E. Lesk
        !          4243: .%A E. Schmidt
        !          4244: .%B LEX \- Lexical Analyzer Generator
        !          4245: .Re
        !          4246: .Rs
        !          4247: .%A Alfred Aho
        !          4248: .%A Ravi Sethi
        !          4249: .%A Jeffrey Ullman
        !          4250: .%B Compilers: Principles, Techniques and Tools
        !          4251: .%I Addison-Wesley
        !          4252: .%D 1986
        !          4253: .%O "Describes the pattern-matching techniques used by flex (deterministic finite automata)"
        !          4254: .Re
        !          4255: .Sh AUTHORS
1.1       deraadt  4256: Vern Paxson, with the help of many ideas and much inspiration from
1.16    ! jmc      4257: Van Jacobson.
        !          4258: Original version by Jef Poskanzer.
        !          4259: The fast table representation is a partial implementation of a design done by
        !          4260: Van Jacobson.
        !          4261: The implementation was done by Kevin Gong and Vern Paxson.
        !          4262: .Pp
1.1       deraadt  4263: Thanks to the many
1.16    ! jmc      4264: .Nm
1.1       deraadt  4265: beta-testers, feedbackers, and contributors, especially Francois Pinard,
                   4266: Casey Leedom,
                   4267: Robert Abramovitz,
                   4268: Stan Adermann, Terry Allen, David Barker-Plummer, John Basrai,
                   4269: Neal Becker, Nelson H.F. Beebe, benson@odi.com,
                   4270: Karl Berry, Peter A. Bigot, Simon Blanchard,
                   4271: Keith Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick Christopher,
                   4272: Brian Clapper, J.T. Conklin,
                   4273: Jason Coughlin, Bill Cox, Nick Cropper, Dave Curtis, Scott David
1.11      deraadt  4274: Daniels, Chris G. Demetriou, Theo de Raadt,
1.1       deraadt  4275: Mike Donahue, Chuck Doucette, Tom Epperly, Leo Eskin,
                   4276: Chris Faylor, Chris Flatters, Jon Forrest, Jeffrey Friedl,
                   4277: Joe Gayda, Kaveh R. Ghazi, Wolfgang Glunz,
                   4278: Eric Goldman, Christopher M. Gould, Ulrich Grepel, Peer Griebel,
                   4279: Jan Hajic, Charles Hemphill, NORO Hideo,
                   4280: Jarkko Hietaniemi, Scott Hofmann,
                   4281: Jeff Honig, Dana Hudes, Eric Hughes, John Interrante,
                   4282: Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones,
                   4283: Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane,
                   4284: Amir Katz, ken@ken.hilco.com, Kevin B. Kenny,
                   4285: Steve Kirsch, Winfried Koenig, Marq Kole, Ronald Lamprecht,
                   4286: Greg Lee, Rohan Lenard, Craig Leres, John Levine, Steve Liddle,
                   4287: David Loffredo, Mike Long,
                   4288: Mohamed el Lozy, Brian Madsen, Malte, Joe Marshall,
                   4289: Bengt Martensson, Chris Metcalf,
                   4290: Luke Mewburn, Jim Meyering, R. Alexander Milowski, Erik Naggum,
                   4291: G.T. Nicol, Landon Noll, James Nordby, Marc Nozell,
                   4292: Richard Ohnemus, Karsten Pahnke,
1.16    ! jmc      4293: Sven Panne, Roland Pesch, Walter Pelissero, Gaumond Pierre,
        !          4294: Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha,
1.1       deraadt  4295: Frederic Raimbault, Pat Rankin, Rick Richardson,
                   4296: Kevin Rodgers, Kai Uwe Rommel, Jim Roskind, Alberto Santini,
                   4297: Andreas Scherer, Darrell Schiebel, Raf Schietekat,
                   4298: Doug Schmidt, Philippe Schnoebelen, Andreas Schwab,
                   4299: Larry Schwimmer, Alex Siegel, Eckehard Stolz, Jan-Erik Strvmquist,
                   4300: Mike Stump, Paul Stuart, Dave Tallman, Ian Lance Taylor,
                   4301: Chris Thewalt, Richard M. Timoney, Jodi Tsai,
1.16    ! jmc      4302: Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams,
        !          4303: Ken Yap, Ron Zellar, Nathan Zelle, David Zuhn,
        !          4304: and those whose names have slipped my marginal mail-archiving skills
        !          4305: but whose contributions are appreciated all the
1.1       deraadt  4306: same.
1.16    ! jmc      4307: .Pp
1.1       deraadt  4308: Thanks to Keith Bostic, Jon Forrest, Noah Friedman,
                   4309: John Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T.
                   4310: Nicol, Francois Pinard, Rich Salz, and Richard Stallman for help with various
                   4311: distribution headaches.
1.16    ! jmc      4312: .Pp
        !          4313: Thanks to Esmond Pitt and Earle Horton for 8-bit character support;
        !          4314: to Benson Margulies and Fred Burke for C++ support;
        !          4315: to Kent Williams and Tom Epperly for C++ class support;
        !          4316: to Ove Ewerlid for support of NUL's;
        !          4317: and to Eric Hughes for support of multiple buffers.
        !          4318: .Pp
1.1       deraadt  4319: This work was primarily done when I was with the Real Time Systems Group
1.16    ! jmc      4320: at the Lawrence Berkeley Laboratory in Berkeley, CA.
        !          4321: Many thanks to all there for the support I received.
        !          4322: .Pp
        !          4323: Send comments to
        !          4324: .Aq vern@ee.lbl.gov .
        !          4325: .Sh BUGS
        !          4326: Some trailing context patterns cannot be properly matched and generate
        !          4327: warning messages
        !          4328: .Pq "dangerous trailing context" .
        !          4329: These are patterns where the ending of the first part of the rule
        !          4330: matches the beginning of the second part, such as
        !          4331: .Qq zx*/xy* ,
        !          4332: where the
        !          4333: .Sq x*
        !          4334: matches the
        !          4335: .Sq x
        !          4336: at the beginning of the trailing context.
        !          4337: (Note that the POSIX draft states that the text matched by such patterns
        !          4338: is undefined.)
        !          4339: .Pp
        !          4340: For some trailing context rules, parts which are actually fixed-length are
        !          4341: not recognized as such, leading to the above mentioned performance loss.
        !          4342: In particular, parts using
        !          4343: .Sq |\&
        !          4344: or
        !          4345: .Sq {n}
        !          4346: (such as
        !          4347: .Qq foo{3} )
        !          4348: are always considered variable-length.
        !          4349: .Pp
        !          4350: Combining trailing context with the special
        !          4351: .Sq |\&
        !          4352: action can result in fixed trailing context being turned into
        !          4353: the more expensive variable trailing context.
        !          4354: For example, in the following:
        !          4355: .Bd -literal -offset indent
        !          4356: %%
        !          4357: abc      |
        !          4358: xyz/def
        !          4359: .Ed
        !          4360: .Pp
        !          4361: Use of
        !          4362: .Fn unput
        !          4363: invalidates yytext and yyleng, unless the
        !          4364: .Dq %array
        !          4365: directive
        !          4366: or the
        !          4367: .Fl l
        !          4368: option has been used.
        !          4369: .Pp
        !          4370: Pattern-matching of NUL's is substantially slower than matching other
        !          4371: characters.
        !          4372: .Pp
        !          4373: Dynamic resizing of the input buffer is slow, as it entails rescanning
        !          4374: all the text matched so far by the current
        !          4375: .Pq generally huge
        !          4376: token.
        !          4377: .Pp
        !          4378: Due to both buffering of input and read-ahead,
        !          4379: it is not possible to intermix calls to
        !          4380: .Aq Pa stdio.h
        !          4381: routines, such as, for example,
        !          4382: .Fn getchar ,
        !          4383: with
        !          4384: .Nm
        !          4385: rules and expect it to work.
        !          4386: Call
        !          4387: .Fn input
        !          4388: instead.
        !          4389: .Pp
        !          4390: The total table entries listed by the
        !          4391: .Fl v
        !          4392: flag excludes the number of table entries needed to determine
        !          4393: what rule has been matched.
        !          4394: The number of entries is equal to the number of DFA states
        !          4395: if the scanner does not use
        !          4396: .Em REJECT ,
        !          4397: and somewhat greater than the number of states if it does.
        !          4398: .Pp
        !          4399: .Em REJECT
        !          4400: cannot be used with the
        !          4401: .Fl f
        !          4402: or
        !          4403: .Fl F
        !          4404: options.
        !          4405: .Pp
        !          4406: The
        !          4407: .Nm
        !          4408: internal algorithms need documentation.