ReF Decompile: Relabeling and Function Call Enhanced Decompile

Yunlong Feng; Bohan Li; Xiaoming Shi; Qingfu Zhu; Wanxiang Che

arXiv:2502.12221·cs.SE·February 19, 2025

ReF Decompile: Relabeling and Function Call Enhanced Decompile

Yunlong Feng, Bohan Li, Xiaoming Shi, Qingfu Zhu, Wanxiang Che

PDF

Open Access 1 Repo 2 Models 1 Datasets

TL;DR

ReF Decompile introduces relabeling and function call strategies to improve decompilation accuracy of binary code into high-level language, leveraging large language models and achieving state-of-the-art results.

Contribution

The paper presents novel relabeling and function call strategies that enhance control flow preservation and variable recovery in LLM-based decompilation.

Findings

01

Achieves 61.43% accuracy on Humaneval-Decompile Benchmark.

02

Surpasses existing baselines in decompilation performance.

03

Effectively preserves control flow and variable information.

Abstract

The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages, enabling analysis in scenarios where source code is unavailable. This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration. The end-to-end decompile method based on large langauge models (LLMs) reduces reliance on additional tools and minimizes manual intervention due to its inherent properties. However, previous end-to-end methods often lose critical information necessary for reconstructing control flow structures and variables when processing binary files, making it challenging to accurately recover the program's logic. To address these issues, we propose the \textbf{ReF Decompile} method, which incorporates the following innovations: (1) The Relabelling strategy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AlongWY/ReF-Dec
pytorchOfficial

Models

Datasets

ylfeng/ReF-Decompile-dataset
dataset· 27 dl
27 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability