On the Generation of Disassembly Ground Truth and the Evaluation of Disassemblers
Kaiyuan Li, Maverick Woo, Limin Jia

TL;DR
This paper introduces a new benchmark suite of 879 binaries and a ground truth generator for disassembly, enabling standardized evaluation of disassemblers, with comprehensive testing of four open-source tools.
Contribution
It provides the first version of a disassembly benchmark suite and a novel ground truth generator based on listing files, supporting multiple compilers and optimization settings.
Findings
Benchmark suite includes 879 diverse binaries.
Evaluation of four open-source disassemblers using the benchmark.
Ground truth generator leverages broad compiler support.
Abstract
When a software transformation or software security task needs to analyze a given program binary, the first step is often disassembly. Since many modern disassemblers have become highly accurate on many binaries, we believe reliable disassembler benchmarking requires standardizing the set of binaries used and the disassembly ground truth about these binaries. This paper presents (i) a first version of our work-in-progress disassembly benchmark suite, which comprises 879 binaries from diverse projects compiled with multiple compilers and optimization settings, and (ii) a novel disassembly ground truth generator leveraging the notion of "listing files", which has broad support by Clang, GCC, ICC, and MSVC. In additional, it presents our evaluation of four prominent open-source disassemblers using this benchmark suite and a custom evaluation system. Our entire system and all generated data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Security and Verification in Computing · Advanced Malware Detection Techniques
