Lexer¶
Simple lexer based on regular expressions.
- class reparsec.lexer.Token(kind, value, start=(0, 0, 0), end=(0, 0, 0))¶
- exception reparsec.lexer.LexError(loc)¶
Exception that is raised if a lexer was unable to process the input.
- Parameters:
loc (
Loc) – Location of error
- reparsec.lexer.split_tokens(src, spec)¶
Splits input string into list of tokens.
The lexer specification is a compiled regular expressions with named capture groups for individual tokens. Only the last capture group is taken into account. If no capture group matches, the token is skipped.
>>> from reparsec.lexer import split_tokens >>> import re
>>> spec = re.compile(r"(?P<num>[0-9]+)|(?P<op>[+])|\s+")
>>> split_tokens("1 + 2 + 3", spec) [Token(kind='num', value='1'), Token(kind='op', value='+'), Token(kind='num', value='2'), Token(kind='op', value='+'), Token(kind='num', value='3')]
- Parameters:
src (
str) – Inputspec (
Pattern[str]) – Compiled regular expression
- Return type:
List[Token]
- reparsec.lexer.token(kind)¶
Parses token of the specified kind and returns the token.
>>> from reparsec.lexer import parse, split_tokens, token >>> import re
>>> spec = re.compile(r"(?P<num>[0-9]+)|(?P<op>[+])") >>> parser = token("num")
>>> parse(parser, split_tokens("1", spec)).unwrap() Token(kind='num', value='1')
>>> parse(parser, split_tokens("+", spec)).unwrap() Traceback (most recent call last): ... reparsec.types.ParseError: at 1:1: expected num
- Parameters:
kind (
str) – Kind of expected token- Return type:
TupleParser[Sequence[Token],Token]
- reparsec.lexer.token_ins(kind, ins_value)¶
Parses token of the specified kind and returns the token. When error recovery is enabled, inserts
Token(kind=kind, value=ins_value)on error.>>> from reparsec.lexer import parse, split_tokens, token_ins >>> import re
>>> spec = re.compile(r"(?P<num>[0-9]+)|(?P<op>[+])") >>> parser = token_ins("num", "0")
>>> parse( ... parser, split_tokens("+", spec), recover=True ... ).unwrap(recover=True) Token(kind='num', value='0')
- Parameters:
kind (
str) – Kind of expected tokenins_value (
str) – Value for inserted token
- Return type:
TupleParser[Sequence[Token],Token]
- reparsec.lexer.parse(parser, stream, recover=False)¶
Wrapper around
reparsec.Parser.parse()that enables line and column tracking.- Parameters:
- Return type:
ParseResult[TypeVar(A),Sequence[Token]]