RL-GRIT: Reinforcement Learning for Grammar Inference

Walt Woods

arXiv:2105.13114·cs.LG·June 4, 2021

RL-GRIT: Reinforcement Learning for Grammar Inference

Walt Woods

PDF

TL;DR

RL-GRIT introduces a reinforcement learning-based approach to infer complex, recursive, and context-sensitive grammars from real-world data formats, aiding in understanding and security analysis.

Contribution

It presents a novel reinforcement learning framework for grammar inference that surpasses previous methods in expressiveness and supports recursive, context-sensitive structures.

Findings

01

Successfully learned recursive control structures in simple data formats

02

Extracted meaningful structure from PDF format fragments

03

Demonstrated surpassing regular and constituency grammar classes

Abstract

When working to understand usage of a data format, examples of the data format are often more representative than the format's specification. For example, two different applications might use very different JSON representations, or two PDF-writing applications might make use of very different areas of the PDF specification to realize the same rendered content. The complexity arising from these distinct origins can lead to large, difficult-to-understand attack surfaces, presenting a security concern when considering both exfiltration and data schizophrenia. Grammar inference can aid in describing the practical language generator behind examples of a data format. However, most grammar inference research focuses on natural language, not data formats, and fails to support crucial features such as type recursion. We propose a novel set of mechanisms for grammar inference, RL-GRIT, and apply…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.