Tutorial ======== Suppose we need to parse a list of numbers separated by commas without an external lexer. This means that we should build our parser from the simplest ones. Parsing numbers --------------- First, we need to recognize digits. For this we will use the :func:`reparsec.sequence.satisfy` parser. It is parameterized with a predicate to test the input token. >>> from reparsec.sequence import satisfy >>> digit = satisfy(str.isdigit) Let's try it in action. We can use the :meth:`reparsec.Parser.parse` method of our freshly created parser to parse a string. It returns either a result of successful parse or an error. You can get the actual value or exception with an :meth:`reparsec.ParseResult.unwrap` method: >>> digit.parse("123").unwrap() '1' >>> digit.parse("a").unwrap() Traceback (most recent call last): ... reparsec.types.ParseError: at 0: unexpected input So far, so good. Next, we want to parse numbers. For simplicity let's assume that a number is a sequence of one or more digits: >>> digits = digit + digit.many() We use method :meth:`reparsec.Parser.many` to construct parser that tries to apply original parser zero or more times, and operator `+` to sequentially apply two parsers. >>> digits.parse("123").unwrap() ('1', ['2', '3']) The output doesn't looks like a number yet. We need :meth:`reparsec.Parser.fmap` to convert it to a number: >>> number = digits.fmap(lambda v: int(v[0] + "".join(v[1]))) >>> number.parse("123").unwrap() 123 Parsing lists ------------- Now we are ready to parse the list. The list is just a sequence of numbers separated by commas. To parse a single comma we will use the :func:`reparsec.sequence.sym` parser, which is parameterized with expected character. Parsers for sequences with separators are usually constructed using the :meth:`reparsec.Parser.sep_by` combinator: >>> from reparsec.sequence import sym >>> list_parser = number.sep_by(sym(",")) >>> list_parser.parse("12,34,56").unwrap() [12, 34, 56] Success! Allowing whitespace ------------------- What if we want to allow whitespace around numbers? Let's extend the parser to accept such inputs: >>> space = satisfy(str.isspace) >>> spaces = space.many() >>> number = digits.fmap(lambda v: int(v[0] + "".join(v[1]))) << spaces >>> comma = sym(",") << spaces >>> list_parser = spaces >> number.sep_by(comma) >>> list_parser.parse(" 1 , 2 ").unwrap() [1, 2] The `<<` and `>>` operators used here are similar to `+`, but return only the value of left or right parser, respectively. Parsing incorrect inputs ------------------------ Until before we focused on parsing valid inputs. But what if we have a string with unexpected characters in it? >>> list_parser.parse("1,a").unwrap() Traceback (most recent call last): ... reparsec.types.ParseError: at 2: unexpected input The parser reported an error and provided a brief description of what was wrong with the input. >>> list_parser.parse("1a").unwrap() [1] Ouch! While reporting errors in general, in some cases our parser silently ignores the rest of the input. Let's fix this by requiring input to end right after the list using the :func:`reparsec.sequence.eof` parser: >>> from reparsec.sequence import eof >>> list_parser = spaces >> number.sep_by(comma) << eof() >>> list_parser.parse("1a").unwrap() Traceback (most recent call last): ... reparsec.types.ParseError: at 1: expected ',' or end of file Much better. Improving error reporting ------------------------- Let's take a closer look at the errors messages: >>> list_parser.parse("1 2").unwrap() Traceback (most recent call last): ... reparsec.types.ParseError: at 2: expected ',' or end of file Seems informative. >>> list_parser.parse("1,").unwrap() Traceback (most recent call last): ... reparsec.types.ParseError: at 2: unexpected input This message is not very helpful. This is because the :func:`reparsec.sequence.satisfy` parser has no idea about the expected token. Let's add some labels to help it with :meth:`reparsec.Parser.label` combinator: >>> digit = satisfy(str.isdigit).label("digit") >>> digits = digit + digit.many() >>> number = digits.fmap( ... lambda v: int(v[0] + "".join(v[1])) ... ).label("number") << spaces >>> list_parser = spaces >> number.sep_by(comma) << eof() >>> list_parser.parse("1,").unwrap() Traceback (most recent call last): ... reparsec.types.ParseError: at 2: expected number Recovering from errors ---------------------- And now for something completely different: >>> list_parser.parse("1 2", recover=True).unwrap(recover=True) [1] The parser recovered from the error and produced a partial result. Pretty useful. However, :func:`reparsec.satisfy` again doesn't know how to fix input besides ignoring some parts of the input: >>> list_parser.parse("1,", recover=True).unwrap(recover=True) Traceback (most recent call last): ... reparsec.types.ParseError: at 2: expected number We can use :meth:`reparsec.Parser.recover_with` to return some value during error recovery: >>> list_parser = spaces >> number.recover_with(0).sep_by(comma) << eof() >>> list_parser.parse("1,", recover=True).unwrap(recover=True) [1, 0] The parser is even capable of fixing multiple errors in the input: >>> list_parser.parse("1,,,2 3", recover=True).unwrap(recover=True) [1, 0, 0, 2] And what if we want to show them to user? >>> list_parser.parse("1,,,2 3", recover=True).unwrap() Traceback (most recent call last): ... reparsec.types.ParseError: at 2: expected number (inserted 0), at 3: expected number (inserted 0), at 6: expected ',' or end of file (skipped 1 token) Line and column tracking ------------------------ Error reporting still needs another improvement. All of the messages in the previous examples contains indexes in the input string as error positions, but it is more convenient to show line and column numbers instead. To achieve this, we will use :func:`reparsec.scannerless.parse`. This is a wrapper around :meth:`reparsec.Parser.parse` that enables position tracking for parsers with string inputs: >>> from reparsec.scannerless import parse >>> src = """\ ... 1,, ... ,2 ... 3 ... """ >>> parse(list_parser, src, recover=True).unwrap() Traceback (most recent call last): ... reparsec.types.ParseError: at 1:3: expected number (inserted 0), at 2:2: expected number (inserted 0), at 3:1: expected ',' or end of file (skipped 2 tokens) As a finishing touch, let's write a helper function so that users of our parser don't have to think about how to properly invoke the parser: >>> from typing import List >>> def parse_list(src: str) -> List[int]: ... return parse(list_parser, src, recover=True).unwrap() >>> parse_list("1, 2, 3") [1, 2, 3] >>> parse_list("1, ,2 3") Traceback (most recent call last): ... reparsec.types.ParseError: at 1:4: expected number (inserted 0), at 1:7: expected ',' or end of file (skipped 1 token) Conclusion ---------- The final parser definition should look like this:: from typing import List from reparsec.scannerless import parse from reparsec.sequence import eof, satisfy, sym spaces = satisfy(str.isspace).many() digit = satisfy(str.isdigit).label("digit") digits = digit + digit.many() number = digits.fmap( lambda v: int(v[0] + "".join(v[1])) ).label("number") << spaces comma = sym(",") << spaces list_parser = spaces >> number.recover_with(0).sep_by(comma) << eof() def parse_list(src: str) -> List[int]: return parse(list_parser, src, recover=True).unwrap()