January 21, 2005

Lexing and parsing in happy

I've been playing around with happy, the yacc-like parser generator for Haskell. I've learnt something about adding location information to syntactical elements - don't do it in the parser, do it in the lexer.

The reason for this is that it's hard to work out when exactly a production is going to be reduced1inside the parser. If you keep the last position from the lexer inside the parser state then it may well be at a position beyond the syntacic element that you are trying to glean information for.

e.g.

Say I have a rule in my parser:


PosId : id {% do pos <- getPos
return (%1, pos) }

Unfortunately when this rule is reduced the lexer may already have read a little bit beyond the id above. Hence, it is better for the lexer itself to add position information to the tokens. Then it is easy to read the correct position information. e.g. with a rule like:


Id : id { $1 }

If your token for ids was defined as id { TokenId $$ } and data type Token is defined as:

data Token = TokenId (Name, Position)
| ...

then the rule above will return a name/position pair.

[1] I'm using this term in the sense that it is used when describing the LALR parsing algorithm. i.e. reduce, as in, the reduce from "shift/reduce conflict".