ReSIM: Re-ranking Binary Similarity Embeddings to Improve Function Search Performance
Gianluca Capozzi, Anna Paola Giancaspro, Fabio Petroni, Leonardo Querzoni, Giuseppe Antonio Di Luna

TL;DR
ReSIM enhances binary function similarity search by adding a neural re-ranker that jointly assesses query-candidate pairs, significantly improving retrieval accuracy over traditional embedding-only methods.
Contribution
This paper introduces ReSIM, a neural re-ranking system that improves binary function search by capturing cross-function relationships, a novel approach in this domain.
Findings
ReSIM achieves an average of 21.7% improvement in nDCG.
ReSIM achieves an average of 27.8% improvement in Recall.
Re-ranking consistently enhances search effectiveness across models and datasets.
Abstract
Binary Function Similarity (BFS), the problem of determining whether two binary functions originate from the same source code, has been extensively studied in recent research across security, software engineering, and machine learning communities. This interest arises from its central role in developing vulnerability detection systems, copyright infringement analysis, and malware phylogeny tools. Nearly all binary function similarity systems embed assembly functions into real-valued vectors, where similar functions map to points that lie close to each other in the metric space. These embeddings enable function search: a query function is embedded and compared against a database of candidate embeddings to retrieve the most similar matches. Despite their effectiveness, such systems rely on bi-encoder architectures that embed functions independently, limiting their ability to capture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Software Testing and Debugging Techniques
