SemDiff: Binary Similarity Detection by Diffing Key-Semantics Graphs
Zian Liu, Zhi Zhang, Siqi Ma, Dongxi Liu, Jun Zhang, Chao Chen,, Shigang Liu, Muhammad Ejaz Ahmed, Yang Xiang

TL;DR
SemDiff introduces a novel approach for binary similarity detection by extracting key semantics graphs based on key instructions, enabling effective comparison across different compilation and obfuscation scenarios.
Contribution
The paper proposes a new method that captures key code behaviors through key-semantics graphs and similarity hashing, improving detection accuracy over existing techniques.
Findings
SemDiff outperforms state-of-the-art tools in various scenarios.
Effective in detecting similar binaries across different optimizations and obfuscations.
Useful for library version search and vulnerability analysis.
Abstract
Binary similarity detection is a critical technique that has been applied in many real-world scenarios where source code is not available, e.g., bug search, malware analysis, and code plagiarism detection. Existing works are ineffective in detecting similar binaries in cases where different compiling optimizations, compilers, source code versions, or obfuscation are deployed. We observe that all the cases do not change a binary's key code behaviors although they significantly modify its syntax and structure. With this key observation, we extract a set of key instructions from a binary to capture its key code behaviors. By detecting the similarity between two binaries' key instructions, we can address well the ineffectiveness limitation of existing works. Specifically, we translate each extracted key instruction into a self-defined key expression, generating a key-semantics graph based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Web Application Security Vulnerabilities
