Know Your Neighborhood: General and Zero-Shot Capable Binary Function Search Powered by Call Graphlets
Joshua Collyer, Tim Watson, Iain Phillips

TL;DR
This paper introduces a novel graph neural network architecture combined with call graphlets for binary code similarity detection, achieving state-of-the-art results across diverse datasets and tasks, including zero-shot scenarios.
Contribution
It presents a new graph data representation called call graphlets and a specialized GNN model, enabling effective binary similarity detection across architectures and in zero-shot settings.
Findings
Achieves state-of-the-art performance on multiple datasets.
Performs well in cross-architecture and zero-shot tasks.
Effective in out-of-domain function inlining detection.
Abstract
Binary code similarity detection is an important problem with applications in areas such as malware analysis, vulnerability research and license violation detection. This paper proposes a novel graph neural network architecture combined with a novel graph data representation called call graphlets. A call graphlet encodes the neighborhood around each function in a binary executable, capturing the local and global context through a series of statistical features. A specialized graph neural network model operates on this graph representation, learning to map it to a feature vector that encodes semantic binary code similarities using deep-metric learning. The proposed approach is evaluated across five distinct datasets covering different architectures, compiler tool chains, and optimization levels. Experimental results show that the combination of call graphlets and the novel graph neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Information Retrieval and Search Behavior · Natural Language Processing Techniques
MethodsGraph Neural Network
