Reverse Engineering Structure and Semantics of Input of a Binary Executable
Seshagiri Prabhu Narasimha, Arun Lakhotia

TL;DR
This paper introduces ByteRI 2.0, an algorithm that uses dynamic taint analysis to recover the structure and semantics of binary executable inputs, aiding reverse engineering and vulnerability detection.
Contribution
It advances prior work by identifying syntactic components and semantic relations in input data, constructing a C/C++-like structure for better understanding of binary input formats.
Findings
Accurately identifies syntactic elements and their structure.
Recovers semantic relations like count and offset fields.
Generates valid input data for real-world programs.
Abstract
Knowledge of the input format of binary executables is important for finding bugs and vulnerabilities, such as generating data for fuzzing or manual reverse engineering. This paper presents an algorithm to recover the structure and semantic relations between fields of the input of binary executables using dynamic taint analysis. The algorithm improves upon prior work by not just partitioning the input into consecutive bytes representing values but also identifying syntactic components of structures, such as atomic fields of fixed and variable lengths, and different types of arrays, such as arrays of atomic fields, arrays of records, and arrays with variant records. It also infers the semantic relations between fields of a structure, such as count fields that specify the count of an array of records or offset fields that specify the start location of a variable-length field within the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization
