Incremental Context-free Grammar Inference in Black Box Settings
Feifei Li, Xiao Chen, Xi Xiao, Xiaoyu Sun, Chuan Chen, Shaohua Wang,, Jitao Han

TL;DR
This paper introduces Kedavra, a new incremental method for inferring context-free grammars from black-box data, which improves quality, speed, and readability over existing heuristic approaches.
Contribution
Kedavra is the first incremental segmentation-based approach for black-box CFG inference, addressing limitations of prior methods by enhancing efficiency and grammar quality.
Findings
Kedavra outperforms Arvada and Treevada in grammar quality.
Kedavra achieves faster inference times.
Kedavra produces more readable grammars.
Abstract
Black-box context-free grammar inference presents a significant challenge in many practical settings due to limited access to example programs. The state-of-the-art methods, Arvada and Treevada, employ heuristic approaches to generalize grammar rules, initiating from flat parse trees and exploring diverse generalization sequences. We have observed that these approaches suffer from low quality and readability, primarily because they process entire example strings, adding to the complexity and substantially slowing down computations. To overcome these limitations, we propose a novel method that segments example strings into smaller units and incrementally infers the grammar. Our approach, named Kedavra, has demonstrated superior grammar quality (enhanced precision and recall), faster runtime, and improved readability through empirical comparison.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling
