Lexer

Simple lexer based on regular expressions.

class reparsec.lexer.Token(kind, value, start=(0, 0, 0), end=(0, 0, 0))
Parameters:
  • kind (str) – Name of capture group from lexer spec

  • value (str) – Value of token

  • start (Loc) – Start location

  • end (Loc) – End location

exception reparsec.lexer.LexError(loc)

Exception that is raised if a lexer was unable to process the input.

Parameters:

loc (Loc) – Location of error

reparsec.lexer.split_tokens(src, spec)

Splits input string into list of tokens.

The lexer specification is a compiled regular expressions with named capture groups for individual tokens. Only the last capture group is taken into account. If no capture group matches, the token is skipped.

>>> from reparsec.lexer import split_tokens
>>> import re
>>> spec = re.compile(r"(?P<num>[0-9]+)|(?P<op>[+])|\s+")
>>> split_tokens("1 + 2 + 3", spec)  
[Token(kind='num', value='1'), Token(kind='op', value='+'),
 Token(kind='num', value='2'), Token(kind='op', value='+'),
 Token(kind='num', value='3')]
Parameters:
  • src (str) – Input

  • spec (Pattern[str]) – Compiled regular expression

Return type:

List[Token]

reparsec.lexer.token(kind)

Parses token of the specified kind and returns the token.

>>> from reparsec.lexer import parse, split_tokens, token
>>> import re
>>> spec = re.compile(r"(?P<num>[0-9]+)|(?P<op>[+])")
>>> parser = token("num")
>>> parse(parser, split_tokens("1", spec)).unwrap()
Token(kind='num', value='1')
>>> parse(parser, split_tokens("+", spec)).unwrap()
Traceback (most recent call last):
  ...
reparsec.types.ParseError: at 1:1: expected num
Parameters:

kind (str) – Kind of expected token

Return type:

TupleParser[Sequence[Token], Token]

reparsec.lexer.token_ins(kind, ins_value)

Parses token of the specified kind and returns the token. When error recovery is enabled, inserts Token(kind=kind, value=ins_value) on error.

>>> from reparsec.lexer import parse, split_tokens, token_ins
>>> import re
>>> spec = re.compile(r"(?P<num>[0-9]+)|(?P<op>[+])")
>>> parser = token_ins("num", "0")
>>> parse(
...     parser, split_tokens("+", spec), recover=True
... ).unwrap(recover=True)
Token(kind='num', value='0')
Parameters:
  • kind (str) – Kind of expected token

  • ins_value (str) – Value for inserted token

Return type:

TupleParser[Sequence[Token], Token]

reparsec.lexer.parse(parser, stream, recover=False)

Wrapper around reparsec.Parser.parse() that enables line and column tracking.

Parameters:
  • parser (Parser[Sequence[Token], TypeVar(A)]) – Parser to run

  • stream (Sequence[Token]) – Stream of tokens to parse

  • recover (bool) – Flag to enable error recovery

Return type:

ParseResult[TypeVar(A), Sequence[Token]]