BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer
Fiorella Artuso, Marco Mormando, Giuseppe A. Di Luna, Leonardo, Querzoni

TL;DR
BinBert is a transformer-based model pre-trained on assembly instructions and symbolic execution data, fine-tunable for specific tasks, and it significantly improves binary code understanding over existing models.
Contribution
We introduce BinBert, a novel pre-trained transformer model for binary code understanding that is fine-tunable and leverages execution-aware training data.
Findings
BinBert outperforms state-of-the-art instruction embedding models.
It demonstrates strong performance on a new multi-task benchmark.
Fine-tuning enhances its ability to adapt to specific binary analysis tasks.
Abstract
A recent trend in binary code analysis promotes the use of neural solutions based on instruction embedding models. An instruction embedding model is a neural network that transforms sequences of assembly instructions into embedding vectors. If the embedding network is trained such that the translation from code to vectors partially preserves the semantic, the network effectively represents an assembly code model. In this paper we present BinBert, a novel assembly code model. BinBert is built on a transformer pre-trained on a huge dataset of both assembly instruction sequences and symbolic execution information. BinBert can be applied to assembly instructions sequences and it is fine-tunable, i.e. it can be re-trained as part of a neural architecture on task-specific data. Through fine-tuning, BinBert learns how to apply the general knowledge acquired with pre-training to the specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Ferroelectric and Negative Capacitance Devices · Advanced Malware Detection Techniques
MethodsTest
