Adding Compilation Metadata To Binaries To Make Disassembly Decidable
Daniel Engel, Freek Verbeek, Pranav Kumar, Binoy Ravindran

TL;DR
This paper introduces a binary format with embedded metadata that clarifies compiler intent, enhancing software safety, maintainability, and analysis without impacting runtime performance.
Contribution
It proposes a new binary format with compiler intent metadata, enabling reliable lifting, analysis, and recompilation, improving over traditional stripped binaries and DWARF.
Findings
Metadata adds roughly 17% of DWARF size
Binaries can be lifted, instrumented, and recompiled correctly
Metadata does not affect runtime behavior or performance
Abstract
The binary executable format is the standard method for distributing and executing software. Yet, it is also as opaque a representation of software as can be. If the binary format were augmented with metadata that provides security-relevant information, such as which data is intended by the compiler to be executable instructions, or how memory regions are expected to be bounded, that would dramatically improve the safety and maintainability of software. In this paper, we propose a binary format that is a middle ground between a stripped black-box binary and open source. We provide a tool that generates metadata capturing the compiler's intent and inserts it into the binary. This metadata enables lifting to a correct and recompilable higher-level representation and makes analysis and instrumentation more reliable. Our evaluation shows that adding metadata does not affect runtime behavior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
