StrTune: Data Dependence-based Code Slicing for Binary Similarity Detection with Fine-tuned Representation
Kaiyan He, Yikun Hu, Xuehui Li, Yunhao Song, Yubo Zhao, Dawu Gu

TL;DR
StrTune is a novel binary code similarity detection method that uses data dependence-based slicing and fine-tuning to improve robustness across different compilation configurations.
Contribution
It introduces a data dependence-based slicing approach combined with a Siamese Network for fine-tuning, addressing limitations of syntax-based analysis in binary similarity detection.
Findings
Effective in capturing semantics stable across compilation configs
Improves similarity detection accuracy over syntax-based methods
Addresses instruction reordering and syntax variations
Abstract
Binary Code Similarity Detection (BCSD) is significant for software security as it can address binary tasks such as malicious code snippets identification and binary patch analysis by comparing code patterns. Recently, there has been a growing focus on artificial intelligence-based approaches in BCSD due to their scalability and generalization. Because binaries are compiled with different compilation configurations, existing approaches still face notable limitations when comparing binary similarity. First, BCSD requires analysis on code behavior, and existing work claims to extract semantic, but actually still makes analysis in terms of syntax. Second, directly extracting features from assembly sequences, existing work cannot address the issues of instruction reordering and different syntax expressions caused by various compilation configurations. In this paper, we propose StrTune,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
