Lexer¶

Simple lexer based on regular expressions.

class reparsec.lexer.Token(kind, value, start=(0, 0, 0), end=(0, 0, 0))¶

Parameters:

kind (str) – Name of capture group from lexer spec
value (str) – Value of token
start (Loc) – Start location
end (Loc) – End location

exception reparsec.lexer.LexError(loc)¶

Exception that is raised if a lexer was unable to process the input.

Parameters:: loc (Loc) – Location of error

reparsec.lexer.split_tokens(src, spec)¶

Splits input string into list of tokens.

The lexer specification is a compiled regular expressions with named capture groups for individual tokens. Only the last capture group is taken into account. If no capture group matches, the token is skipped.

>>> from reparsec.lexer import split_tokens
>>> import re

>>> spec = re.compile(r"(?P<num>[0-9]+)|(?P<op>[+])|\s+")

>>> split_tokens("1 + 2 + 3", spec)  
[Token(kind='num', value='1'), Token(kind='op', value='+'),
 Token(kind='num', value='2'), Token(kind='op', value='+'),
 Token(kind='num', value='3')]

Parameters:

src (str) – Input
spec (Pattern[str]) – Compiled regular expression

Return type:

List[Token]

reparsec.lexer.token(kind)¶

Parses token of the specified kind and returns the token.

>>> from reparsec.lexer import parse, split_tokens, token
>>> import re

>>> spec = re.compile(r"(?P<num>[0-9]+)|(?P<op>[+])")
>>> parser = token("num")

>>> parse(parser, split_tokens("1", spec)).unwrap()
Token(kind='num', value='1')

>>> parse(parser, split_tokens("+", spec)).unwrap()
Traceback (most recent call last):
  ...
reparsec.types.ParseError: at 1:1: expected num

Parameters:: kind (str) – Kind of expected token
Return type:: TupleParser[Sequence[Token], Token]

reparsec.lexer.token_ins(kind, ins_value)¶

Parses token of the specified kind and returns the token. When error recovery is enabled, inserts Token(kind=kind, value=ins_value) on error.

>>> from reparsec.lexer import parse, split_tokens, token_ins
>>> import re

>>> spec = re.compile(r"(?P<num>[0-9]+)|(?P<op>[+])")
>>> parser = token_ins("num", "0")

>>> parse(
...     parser, split_tokens("+", spec), recover=True
... ).unwrap(recover=True)
Token(kind='num', value='0')

Parameters:

kind (str) – Kind of expected token
ins_value (str) – Value for inserted token

Return type:

TupleParser[Sequence[Token], Token]

reparsec.lexer.parse(parser, stream, recover=False)¶

Wrapper around reparsec.Parser.parse() that enables line and column tracking.

Parameters:

parser (Parser[Sequence[Token], TypeVar(A)]) – Parser to run
stream (Sequence[Token]) – Stream of tokens to parse
recover (bool) – Flag to enable error recovery

Return type:

ParseResult[TypeVar(A), Sequence[Token]]

Lexer¶

reparsec

Navigation

Related Topics