Grammars for Free: Toward Grammar Inference for Ad Hoc Parsers
Michael Schr\"oder, J\"urgen Cito

TL;DR
This paper proposes an automatic system to infer formal grammars for ad hoc parsers, which can enhance program understanding, testing, security analysis, and facilitate new applications like repository mining and parser synthesis.
Contribution
It introduces a novel approach for automatically inferring grammars from ad hoc parsers, addressing a gap in documentation and analysis tools.
Findings
Demonstrates feasibility of grammar inference for ad hoc parsers
Enables improved program comprehension and security reasoning
Facilitates new applications in software analysis and parser synthesis
Abstract
Ad hoc parsers are everywhere: they appear any time a string is split, looped over, interpreted, transformed, or otherwise processed. Every ad hoc parser gives rise to a language: the possibly infinite set of input strings that the program accepts without going wrong. Any language can be described by a formal grammar: a finite set of rules that can generate all strings of that language. But programmers do not write grammars for ad hoc parsers -- even though they would be eminently useful. Grammars can serve as documentation, aid program comprehension, generate test inputs, and allow reasoning about language-theoretic security. We propose an automatic grammar inference system for ad hoc parsers that would enable all of these use cases, in addition to opening up new possibilities in mining software repositories and bi-directional parser synthesis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
