3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers
Sarah Fakhoury, Markus Kuppe, Shuvendu K. Lahiri, Tahina Ramananandro,, Nikhil Swamy

TL;DR
3DGen leverages AI and symbolic methods to automatically generate provably correct binary format parsers from informal specifications, improving security and reliability in data parsing.
Contribution
The paper introduces 3DGen, a novel framework that uses AI to transform informal format descriptions into formal, verifiable parser specifications with symbolic test generation.
Findings
Successfully applied to 20 Internet formats
Generated safe, efficient C parsers with formal correctness guarantees
Demonstrated scalability of AI-assisted formal parser synthesis
Abstract
Improper parsing of attacker-controlled input is a leading source of software security vulnerabilities, especially when programmers transcribe informal format descriptions in RFCs into efficient parsing logic in low-level, memory unsafe languages. Several researchers have proposed formal specification languages for data formats from which efficient code can be extracted. However, distilling informal requirements into formal specifications is challenging and, despite their benefits, new, formal languages are hard for people to learn and use. In this work, we present 3DGen, a framework that makes use of AI agents to transform mixed informal input, including natural language documents (i.e., RFCs) and example inputs into format specifications in a language called 3D. To support humans in understanding and trusting the generated specifications, 3DGen uses symbolic methods to also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Computer Graphics and Visualization Techniques · Advanced Image and Video Retrieval Techniques
