Beyond C/C++: Probabilistic and LLM Methods for Next-Generation Software Reverse Engineering
Zhuo Zhuo, Xiangyu Zhang

TL;DR
This paper introduces a hybrid approach combining probabilistic analysis and fine-tuned large language models to improve reverse engineering of modern binaries from languages like Rust, Go, and Mojo, addressing limitations of traditional methods.
Contribution
It presents a novel hybrid method that integrates probabilistic binary analysis with LLMs to better handle uncertainties and semantic information in reverse engineering.
Findings
Enhanced accuracy in reverse engineering results
Better handling of ambiguous and incomplete binary information
Scalable approach adaptable to new programming languages
Abstract
This proposal discusses the growing challenges in reverse engineering modern software binaries, particularly those compiled from newer system programming languages such as Rust, Go, and Mojo. Traditional reverse engineering techniques, developed with a focus on C and C++, fall short when applied to these newer languages due to their reliance on outdated heuristics and failure to fully utilize the rich semantic information embedded in binary programs. These challenges are exacerbated by the limitations of current data-driven methods, which are susceptible to generating inaccurate results, commonly referred to as hallucinations. To overcome these limitations, we propose a novel approach that integrates probabilistic binary analysis with fine-tuned large language models (LLMs). Our method systematically models the uncertainties inherent in reverse engineering, enabling more accurate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software Testing and Debugging Techniques
