RL-GRIT: Reinforcement Learning for Grammar Inference
Walt Woods

TL;DR
RL-GRIT introduces a reinforcement learning-based approach to infer complex, recursive, and context-sensitive grammars from real-world data formats, aiding in understanding and security analysis.
Contribution
It presents a novel reinforcement learning framework for grammar inference that surpasses previous methods in expressiveness and supports recursive, context-sensitive structures.
Findings
Successfully learned recursive control structures in simple data formats
Extracted meaningful structure from PDF format fragments
Demonstrated surpassing regular and constituency grammar classes
Abstract
When working to understand usage of a data format, examples of the data format are often more representative than the format's specification. For example, two different applications might use very different JSON representations, or two PDF-writing applications might make use of very different areas of the PDF specification to realize the same rendered content. The complexity arising from these distinct origins can lead to large, difficult-to-understand attack surfaces, presenting a security concern when considering both exfiltration and data schizophrenia. Grammar inference can aid in describing the practical language generator behind examples of a data format. However, most grammar inference research focuses on natural language, not data formats, and fails to support crucial features such as type recursion. We propose a novel set of mechanisms for grammar inference, RL-GRIT, and apply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
