Lake symbols for island parsing
Katsumi Okuda, Shigeru Chiba

TL;DR
This paper introduces lake symbols to simplify the development of island parsers by automating the enumeration of water symbols, extending PEG, and demonstrating significant grammar rule reduction in practical parser implementations.
Contribution
It proposes lake symbols and an extension to PEG, enabling easier and more efficient development of island parsers by automating water symbol enumeration.
Findings
Reduced 42% of grammar rules in implemented parsers
Successfully developed 36 Java and 20 Python island parsers
Automated water symbol enumeration improves parser development efficiency
Abstract
Context: An island parser reads an input text and builds the parse (or abstract syntax) tree of only the programming constructs of interest in the text. These constructs are called islands and the rest of the text is called water, which the parser ignores and skips over. Since an island parser does not have to parse all the details of the input, it is often easy to develop but still useful enough for a number of software engineering tools. When a parser generator is used, the developer can implement an island parser by just describing a small number of grammar rules, for example, in Parsing Expression Grammar (PEG). Inquiry: In practice, however, the grammar rules are often complicated since the developer must define the water inside the island; otherwise, the island parsing will not reduce the total number of grammar rules. When describing the grammar rules for such water, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
