Understanding the AI-powered Binary Code Similarity Detection
Lirong Fu, Peiyu Liu, Wenlong Meng, Kangjie Lu, Shize Zhou, Xuhong, Zhang, Wenzhi Chen, Shouling Ji

TL;DR
This paper systematically evaluates AI-powered binary code similarity detection methods, analyzing neural network embedding strategies and evaluation practices to identify performance gaps and guide future research in practical applications.
Contribution
It provides the first comprehensive comparison of BinSD systems across applications and investigates the impact of neural network designs and evaluation methodologies.
Findings
GNN-based BinSD systems excel in similar function detection but have room for improvement.
Performance varies significantly across downstream applications.
Current evaluation metrics often do not accurately reflect real-world performance.
Abstract
AI-powered binary code similarity detection (BinSD), which transforms intricate binary code comparison to the distance measure of code embedding through neural networks, has been widely applied to program analysis. However, due to the diversity of the adopted embedding strategies, evaluation methodologies, running environments, and/or benchmarks, it is difficult to quantitatively understand to what extent the BinSD problem has been solved, especially in realworld applications. Moreover, the lack of an in-depth investigation of the increasingly complex embedding neural networks and various evaluation methodologies has become the key factor hindering the development of AI-powered BinSD. To fill these research gaps, in this paper, we present a systematic evaluation of state-of-the-art AI-powered BinSD approaches by conducting a comprehensive comparison of BinSD systems on similar function…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research
