Learning Highly Recursive Input Grammars
Neil Kulkarni, Caroline Lemieux, Koushik Sen

TL;DR
This paper introduces Arvada, a novel algorithm for learning context-free grammars from positive examples and an oracle, which effectively captures recursive structures, outperforming previous methods like GLADE in recall and F1 score.
Contribution
Arvada's key innovation is the bubbling operation that enables recursive generalization, significantly improving grammar learning for highly recursive languages.
Findings
Arvada achieves 4.98x higher recall than GLADE.
Arvada attains 3.13x higher F1 score than GLADE.
Arvada requires fewer oracle calls, only 0.87x of GLADE's.
Abstract
This paper presents Arvada, an algorithm for learning context-free grammars from a set of positive examples and a Boolean-valued oracle. Arvada learns a context-free grammar by building parse trees from the positive examples. Starting from initially flat trees, Arvada builds structure to these trees with a key operation: it bubbles sequences of sibling nodes in the trees into a new node, adding a layer of indirection to the tree. Bubbling operations enable recursive generalization in the learned grammar. We evaluate Arvada against GLADE and find it achieves on average increases of 4.98x in recall and 3.13x in F1 score, while incurring only a 1.27x slowdown and requiring only 0.87x as many calls to the oracle. Arvada has a particularly marked improvement over GLADE on grammars with highly recursive structure, like those of programming languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Natural Language Processing Techniques · Software Testing and Debugging Techniques
