3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers

Sarah Fakhoury; Markus Kuppe; Shuvendu K. Lahiri; Tahina Ramananandro,; Nikhil Swamy

arXiv:2404.10362·cs.SE·May 8, 2024·1 cites

3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers

Sarah Fakhoury, Markus Kuppe, Shuvendu K. Lahiri, Tahina Ramananandro,, Nikhil Swamy

PDF

Open Access

TL;DR

3DGen leverages AI and symbolic methods to automatically generate provably correct binary format parsers from informal specifications, improving security and reliability in data parsing.

Contribution

The paper introduces 3DGen, a novel framework that uses AI to transform informal format descriptions into formal, verifiable parser specifications with symbolic test generation.

Findings

01

Successfully applied to 20 Internet formats

02

Generated safe, efficient C parsers with formal correctness guarantees

03

Demonstrated scalability of AI-assisted formal parser synthesis

Abstract

Improper parsing of attacker-controlled input is a leading source of software security vulnerabilities, especially when programmers transcribe informal format descriptions in RFCs into efficient parsing logic in low-level, memory unsafe languages. Several researchers have proposed formal specification languages for data formats from which efficient code can be extracted. However, distilling informal requirements into formal specifications is challenging and, despite their benefits, new, formal languages are hard for people to learn and use. In this work, we present 3DGen, a framework that makes use of AI agents to transform mixed informal input, including natural language documents (i.e., RFCs) and example inputs into format specifications in a language called 3D. To support humans in understanding and trusting the generated specifications, 3DGen uses symbolic methods to also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Computer Graphics and Visualization Techniques · Advanced Image and Video Retrieval Techniques