Data Flows in You: Benchmarking and Improving Static Data-flow Analysis on Binary Executables
Nicolaas Weideman, Sima Arasteh, Mukund Raghothaman, Jelena Mirkovic, Christophe Hauser

TL;DR
This paper introduces a large benchmark dataset for evaluating binary data-flow analysis, assesses current tools' accuracy, and proposes model improvements that substantially enhance analysis precision and vulnerability detection.
Contribution
The paper provides the first extensive benchmark dataset for binary data-flow analysis and proposes model extensions that significantly improve analysis accuracy.
Findings
Current data-flow analysis tools have very low accuracy.
Model extensions improve recall to 0.99 and precision from 0.13 to 0.32.
Enhanced analysis leads to better vulnerability identification.
Abstract
Data-flow analysis is a critical component of security research. Theoretically, accurate data-flow analysis in binary executables is an undecidable problem, due to complexities of binary code. Practically, many binary analysis engines offer some data-flow analysis capability, but we lack understanding of the accuracy of these analyses, and their limitations. We address this problem by introducing a labeled benchmark data set, including 215,072 microbenchmark test cases, mapping to 277,072 binary executables, created specifically to evaluate data-flow analysis implementations. Additionally, we augment our benchmark set with dynamically-discovered data flows from 6 real-world executables. Using our benchmark data set, we evaluate three state of the art data-flow analysis implementations, in angr, Ghidra and Miasm and discuss their very low accuracy and reasons behind it. We further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Business Process Modeling and Analysis · Cloud Computing and Resource Management
