## [CS143-PA1] Lexer

### Target

write a cool lexer using flex

### 1. Basic Keywords

4 Scanner Results

• Your implementation needs to define Flex rules that match the regular expressions defining each token defined in cool-parse.h and perform the appropriate action for each matched token.

• For example, if you match on a token BOOL_CONST, your lexer has to record whether its value is true or false;

• similarly if you match on a TYPEID token, you need to record the name of the type.

• Note that not every token requires storing additional information; for example, only returning the token type is sufficient for some tokens like keywords.

find all tokens definition in cool-parse.h , return corresponding token directly

grammar in flex for case-insensitive

• (?r-s:pattern)
apply option r and omit option s while interpreting pattern. Options may be zero or more of the characters i, s, or x.
i means case-insensitive. -i means case-sensitive.

5 Implementation Notes

• Each call on the scanner returns the next token and lexeme from the input.

• The second component, the semantic value or lexeme, is placed in the global union cool yylval, which is of type YYSTYPE. The type YYSTYPE is also defined in cool-parse.h.

• For class identifiers, object identifiers, integers, and strings, the semantic value should be a Symbol stored in the field cool_yylval.symbol. For boolean constants, the semantic value is stored in the field cool_yylval.boolean.

3 String Tables

cool-tour.pdf

• An important point about the structure of the Cool compiler is that there are actually three distinct string tables: one for string constants (stringtable), one for integer constants (inttable), and one for identifiers (idtable).

### 2. Comment

2 Introduction to Flex

• When writing rules in Flex, it may be necessary to perform different actions depending on previously encountered tokens.

• For example, when processing a closing comment token, you might be interested in knowing whether an opening comment was previously encountered.

• One obvious way to track state is to declare global variables in your declaration section, which are set to true when certain tokens of interest are encountered.

• Flex also provides syntactic sugar for achieving similar functionality by using state declarations such as:

%Start COMMENT

which can be set to true by writing BEGIN(COMMENT).

• To perform an action only if an opening comment was previously encountered, you can predicate your rule on COMMENT using the syntax:

• There is also a special default state called INITIAL which is active unless you explicitly indicate the
beginning of a new state.

cool-manual.pdf

• There are two forms of comments in Cool. Any characters between two dashes “–” and the next newline (or EOF, if there is no next newline) are treated as comments. Comments may also be written by enclosing text in (∗ . . . ∗).

4.1 Error Handling

• If a comment remains open when EOF is encountered, report this error with the message “EOF in comment”. Do not tokenize the comment’s contents simply because the terminator is missing.
• If you see *) outside a comment, report this error as “Unmatched *)“, rather than tokenizing it
as * and ).

### 3. Strings

4.3 Strings

• Your scanner should convert escape characters in string constants to their correct values
• return error for a string containing the literal null character.

4.1 Error Handling

• String constant too long
• Unterminated string constant

### 4. Others and Error

5 Implementation Notes

• The tokens for single character symbols (e.g., “;” and “,”) are represented just by the integer (ASCII) value of the character itself.

All of the single character tokens are listed in the grammar for Cool in the Cool manual.

### 5. problems encountered

line number

be careful the order of the rules..