Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations
Xiuwei Shang, Li Hu, Shaoyin Cheng, Guoqiang Chen, Benlong Wu, Weiming, Zhang, Nenghai Yu

TL;DR
This paper introduces IRBinDiff, a novel binary code similarity detection method that combines semantic and structural analysis using graph contrastive learning on intermediate representations, improving large-scale retrieval accuracy.
Contribution
IRBinDiff leverages LLVM-IR and a pre-trained language model with graph neural networks, incorporating momentum contrastive learning to better handle compilation differences and large candidate sets.
Findings
Outperforms existing BCSD methods in various scenarios
Effective in distinguishing subtle function similarities
Robust across different compilation settings
Abstract
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification. As IoT devices proliferate and rapidly evolve, their highly heterogeneous hardware architectures and complex compilation settings, coupled with the demand for large-scale function retrieval in practical applications, put forward higher requirements for BCSD methods. In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction, and integrates a pre-trained language model with a graph neural network to capture both semantic and structural information from different perspectives. By introducing momentum contrastive learning, it effectively enhances retrieval capabilities in large-scale candidate function sets, distinguishing between subtle function…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiochemical and Structural Characterization · Peptidase Inhibition and Analysis · vaccines and immunoinformatics approaches
