FastBCSD: Fast and Efficient Neural Network for Binary Code Similarity Detection
Chensen Huang, Guibo Zhu, Guojing Ge, Taihao Li, Jinqiao Wang

TL;DR
FastBCSD is a lightweight neural network for binary code similarity detection that achieves comparable or better accuracy than larger models while significantly reducing computational costs and inference time.
Contribution
The paper introduces FastBCSD, a neural network that uses dynamic instruction encoding and assembly code input to improve efficiency without sacrificing accuracy.
Findings
Achieves similar or better accuracy than state-of-the-art models.
Reduces parameter size by over 6 times.
Speeds up inference time to one-fifth of previous methods.
Abstract
Binary code similarity detection (BCSD) has various applications, including but not limited to vulnerability detection, plagiarism detection, and malware detection. Previous research efforts mainly focus on transforming binary code to assembly code strings using reverse compilation and then using pre-trained deep learning models with large parameters to obtain feature representation vector of binary code. While these models have proven to be effective in representing binary code, their large parameter size leads to considerable computational expenses during both training and inference. In this paper, we present a lightweight neural network, called FastBCSD, that employs a dynamic instruction vector encoding method and takes only assembly code as input feature to achieve comparable accuracy to the pre-training models while reducing the computational resources and time cost. On the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research
