Source Code Foundation Models are Transferable Binary Analysis Knowledge   Bases

Zian Su; Xiangzhe Xu; Ziyang Huang; Kaiyuan Zhang; Xiangyu Zhang

arXiv:2405.19581·cs.SE·November 22, 2024

Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases

Zian Su, Xiangzhe Xu, Ziyang Huang, Kaiyuan Zhang, Xiangyu Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel framework combining pre-trained source code models and black-box LLMs to improve binary reverse engineering, achieving significant gains in binary summarization and function name recovery.

Contribution

It presents a new probe-and-recover framework that leverages uni-modal code models and black-box LLMs to enhance binary analysis performance.

Findings

01

10.3% relative gain in binary summarization (CHRF)

02

16.7% relative gain in summarization (GPT4 metric)

03

6.7% and 7.4% absolute improvements in name recovery

Abstract

Human-Oriented Binary Reverse Engineering (HOBRE) lies at the intersection of binary and source code, aiming to lift binary code to human-readable content relevant to source code, thereby bridging the binary-source semantic gap. Recent advancements in uni-modal code model pre-training, particularly in generative Source Code Foundation Models (SCFMs) and binary understanding models, have laid the groundwork for transfer learning applicable to HOBRE. However, existing approaches for HOBRE rely heavily on uni-modal models like SCFMs for supervised fine-tuning or general LLMs for prompting, resulting in sub-optimal performance. Inspired by recent progress in large multi-modal models, we propose that it is possible to harness the strengths of uni-modal code models from both sides to bridge the semantic gap effectively. In this paper, we introduce a novel probe-and-recover framework that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ziansu/prorec
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Computational Techniques and Applications