A Semantics-Based Hybrid Approach on Binary Code Similarity Comparison
Yikun Hu, Hui Wang, Yuanyuan Zhang, Bodong Li, Dawu Gu

TL;DR
This paper introduces a semantics-based hybrid approach for binary code similarity comparison that combines execution and emulation to improve accuracy across architectures and obfuscation techniques.
Contribution
It presents a novel hybrid method that leverages runtime semantic signatures from execution and emulation, outperforming existing semantics-less techniques.
Findings
High accuracy in cross-architecture comparison
Effective against code obfuscation techniques
Performed over 100 million function comparisons
Abstract
Binary code similarity comparison is a methodology for identifying similar or identical code fragments in binary programs. It is indispensable in fields of software engineering and security, which has many important applications (e.g., plagiarism detection, bug detection). With the widespread of smart and IoT (Internet of Things) devices, an increasing number of programs are ported to multiple architectures (e.g. ARM, MIPS). It becomes necessary to detect similar binary code across architectures as well. The main challenge of this topic lies in the semantics-equivalent code transformation resulting from different compilation settings, code obfuscation, and varied instruction set architectures. Another challenge is the trade-off between comparison accuracy and coverage. Unfortunately, existing methods still heavily rely on semantics-less code features which are susceptible to the code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
