UniASM: Binary Code Similarity Detection without Fine-tuning
Yeming Gu, Hui Shu, Fei Kang, Fan Hu

TL;DR
UniASM introduces a novel binary code embedding model based on UniLM, utilizing rich semantic representations and new training tasks, significantly improving performance in binary similarity detection across various challenging scenarios.
Contribution
The paper presents the first UniLM-based binary code embedding model, UniASM, with innovative code representation and training tasks, advancing the state-of-the-art in binary code similarity detection.
Findings
Outperforms SOTA methods with 12.7% higher Recall@1 across cross-compilers.
Achieves 8.5% improvement in cross-optimization-levels scenarios.
Surpasses baselines in real-world vulnerability search tasks.
Abstract
Binary code similarity detection (BCSD) is widely used in various binary analysis tasks such as vulnerability search, malware detection, clone detection, and patch analysis. Recent studies have shown that the learning-based binary code embedding models perform better than the traditional feature-based approaches. However, previous studies have not delved deeply into the key factors that affect model performance. In this paper, we design extensive ablation studies to explore these influencing factors. The experimental results have provided us with many new insights. We have made innovations in both code representation and model selection: we propose a novel rich-semantic function representation technique to ensure the model captures the intricate nuances of binary code, and we introduce the first UniLM-based binary code embedding model, named UniASM, which includes two newly designed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Web Application Security Vulnerabilities
