Unsupervised Binary Code Translation with Application to Code Similarity Detection and Vulnerability Discovery
Iftakhar Ahmad, Lannan Luo

TL;DR
This paper introduces UNSUPERBINTRANS, an unsupervised method for translating binary code across architectures using neural machine translation techniques, enabling improved code similarity detection and vulnerability discovery in low-resource ISAs.
Contribution
It presents a novel unsupervised binary code translation approach leveraging NMT ideas, addressing data scarcity in cross-architecture binary analysis.
Findings
High accuracy in code similarity detection
Effective vulnerability discovery across architectures
Enables analysis for low-resource ISAs
Abstract
Binary code analysis has immense importance in the research domain of software security. Today, software is very often compiled for various Instruction Set Architectures (ISAs). As a result, cross-architecture binary code analysis has become an emerging problem. Recently, deep learning-based binary analysis has shown promising success. It is widely known that training a deep learning model requires a massive amount of data. However, for some low-resource ISAs, an adequate amount of data is hard to find, preventing deep learning from being widely adopted for binary analysis. To overcome the data scarcity problem and facilitate cross-architecture binary code analysis, we propose to apply the ideas and techniques in Neural Machine Translation (NMT) to binary code analysis. Our insight is that a binary, after disassembly, is represented in some assembly language. Given a binary in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Software Reliability and Analysis Research · Software Engineering Research
