VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity
S. VenkataKeerthy, Soumya Banerjee, Sayan Dey, Yashas Andaluri, Raghul, PS, Subrahmanyam Kalyanasundaram, Fernando Magno Quint\~ao Pereira,, Ramakrishna Upadrasta

TL;DR
VexIR2Vec is an architecture-neutral binary similarity framework that uses VEX-IR normalization and knowledge graph embeddings to accurately compare functions across diverse binaries, architectures, and obfuscations.
Contribution
It introduces VexIR2Vec, a novel approach combining IR normalization and embedding techniques for robust, scalable binary similarity assessment.
Findings
Outperforms baselines by up to 60% in diffing accuracy.
Achieves 0.76 mean average precision in search tasks.
Runs 3.1-3.5 times faster than closest methods.
Abstract
Binary similarity involves determining whether two binary programs exhibit similar functionality, often originating from the same source code. In this work, we propose VexIR2Vec, an approach for binary similarity using VEX-IR, an architecture-neutral Intermediate Representation (IR). We extract the embeddings from sequences of basic blocks, termed peepholes, derived by random walks on the control-flow graph. The peepholes are normalized using transformations inspired by compiler optimizations. The VEX-IR Normalization Engine mitigates, with these transformations, the architectural and compiler-induced variations in binaries while exposing semantic similarities. We then learn the vocabulary of representations at the entity level of the IR using the knowledge graph embedding techniques in an unsupervised manner. This vocabulary is used to derive function embeddings for similarity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Software Testing and Debugging Techniques
MethodsSiamese Network · Lib
