Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs
Fei Zuo, Xiaopeng Li, Patrick Young, Lannan Luo, Qiang Zeng, Zhexin, Zhang

TL;DR
This paper introduces INNEREYE, a novel NLP-inspired system for binary code similarity comparison across different architectures, improving accuracy and scalability for applications like vulnerability discovery and code plagiarism detection.
Contribution
It applies NLP techniques to binary analysis, addressing cross-architecture semantic similarity and containment detection with a new system outperforming existing methods.
Findings
Outperforms existing methods in accuracy, efficiency, and scalability.
Effective in cross-architecture vulnerability discovery.
Demonstrates the applicability of NLP techniques to large-scale binary analysis.
Abstract
Binary code analysis allows analyzing binary code without having access to the corresponding source code. A binary, after disassembly, is expressed in an assembly language. This inspires us to approach binary analysis by leveraging ideas and techniques from Natural Language Processing (NLP), a rich area focused on processing text of various natural languages. We notice that binary code analysis and NLP share a lot of analogical topics, such as semantics extraction, summarization, and classification. This work utilizes these ideas to address two important code similarity comparison problems. (I) Given a pair of basic blocks for different instruction set architectures (ISAs), determining whether their semantics is similar or not; and (II) given a piece of code of interest, determining if it is contained in another piece of assembly code for a different ISA. The solutions to these two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
