Inferring Drop-in Binary Parsers from Program Executions
Thurston H. Y. Dang, Jose P. Cambronero, Martin C. Rinard

TL;DR
BIEBER is a system that models and regenerates full parsers from program executions, generalizing to arbitrary file sizes and formats, aiding reverse engineering and bug detection in file parsers.
Contribution
This paper introduces BIEBER, the first system to automatically infer complete binary parsers from program executions, including decision trees and IR translation for multiple formats.
Findings
Successfully regenerated parsers for six file formats with high accuracy.
Generated parsers include safety checks to prevent common errors.
Helped identify and fix bugs in existing image parsing libraries.
Abstract
We present BIEBER (Byte-IdEntical Binary parsER), the first system to model and regenerate a full working parser from instrumented program executions. To achieve this, BIEBER exploits the regularity (e.g., header fields and array-like data structures) that is commonly found in file formats. Key generalization steps derive strided loops that parse input file data and rewrite concrete loop bounds with expressions over input file header bytes. These steps enable BIEBER to generalize parses of specific input files to obtain parsers that operate over input files of arbitrary size. BIEBER also incrementally and efficiently infers a decision tree that reads file header bytes to route input files of different types to inferred parsers of the appropriate type. The inferred parsers and decision tree are expressed in an IR; separate backends (C and Perl in our prototype) can translate the IR into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Machine Learning and Algorithms
