A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis
Kimberly Redmond, Lannan Luo, Qiang Zeng

TL;DR
This paper introduces a novel cross-architecture instruction embedding model inspired by NLP techniques, enabling semantic comparison of binary code instructions across different hardware architectures, which improves binary code similarity analysis.
Contribution
It presents the first joint learning approach for cross-architecture instruction embeddings, capturing semantic relationships across architectures for binary code analysis.
Findings
Outperforms code statistics-based methods in basic block comparison
Demonstrates effectiveness in cross-architecture binary code similarity tasks
Shows potential for application in various binary analysis tasks
Abstract
Given a closed-source program, such as most of proprietary software and viruses, binary code analysis is indispensable for many tasks, such as code plagiarism detection and malware analysis. Today, source code is very often compiled for various architectures, making cross-architecture binary code analysis increasingly important. A binary, after being disassembled, is expressed in an assembly languages. Thus, recent work starts exploring Natural Language Processing (NLP) inspired binary code analysis. In NLP, words are usually represented in high-dimensional vectors (i.e., embeddings) to facilitate further processing, which is one of the most common and critical steps in many NLP tasks. We regard instructions as words in NLP-inspired binary code analysis, and aim to represent instructions as embeddings as well. To facilitate cross-architecture binary code analysis, our goal is that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Testing and Debugging Techniques
