BLens: Contrastive Captioning of Binary Functions using Ensemble Embedding
Tristan Benoit, Yunru Wang, Moritz Dannehl, Johannes Kinder

TL;DR
This paper introduces BLens, a novel contrastive captioning approach that leverages ensemble embeddings and transformer models to improve automatic function naming in binary reverse engineering, especially across different projects.
Contribution
BLens applies ensemble embedding and contrastive learning to enhance function name prediction, significantly outperforming existing transformer-based models in generalization tasks.
Findings
Achieves higher F1 scores than state-of-the-art in standard and cross-project settings.
Demonstrates improved generalizability in function naming across unrelated binaries.
Outperforms previous models especially in low-shared component scenarios.
Abstract
Function names can greatly aid human reverse engineers, which has spurred the development of machine learning-based approaches to predicting function names in stripped binaries. Much current work in this area now uses transformers, applying a metaphor of machine translation from code to function names. Still, function naming models face challenges in generalizing to projects unrelated to the training set. In this paper, we take a completely new approach by transferring advances in automated image captioning to the domain of binary reverse engineering, such that different parts of a binary function can be associated with parts of its name. We propose BLens, which combines multiple binary function embeddings into a new ensemble representation, aligns it with the name representation latent space via a contrastive learning approach, and generates function names with a transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Multimodal Machine Learning Applications
MethodsContrastive Learning
