Leveraging Artificial Intelligence on Binary Code Comprehension

Yifan Zhang

arXiv:2210.05103·cs.SE·October 12, 2022

Leveraging Artificial Intelligence on Binary Code Comprehension

Yifan Zhang

PDF

TL;DR

This paper explores using AI models enriched with source code domain knowledge to improve binary code comprehension, addressing the semantic gap and aiding reverse engineering and analysis tasks.

Contribution

It introduces a novel approach to incorporate source code information into AI models for binary understanding, enhancing interpretability and performance.

Findings

01

Proposes a framework for AI-assisted binary comprehension

02

Highlights the importance of source code context in binary analysis

03

Suggests metrics for evaluating AI model effectiveness with human studies

Abstract

Understanding binary code is an essential but complex software engineering task for reverse engineering, malware analysis, and compiler optimization. Unlike source code, binary code has limited semantic information, which makes it challenging for human comprehension. At the same time, compiling source to binary code, or transpiling among different programming languages (PLs) can provide a way to introduce external knowledge into binary comprehension. We propose to develop Artificial Intelligence (AI) models that aid human comprehension of binary code. Specifically, we propose to incorporate domain knowledge from large corpora of source code (e.g., variable names, comments) to build AI models that capture a generalizable representation of binary code. Lastly, we will investigate metrics to assess the performance of models that apply to binary code by using human studies of comprehension.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.