Beyond the C: Retargetable Decompilation using Neural Machine Translation
Iman Hosseini, Brendan Dolan-Gavitt

TL;DR
This paper introduces Beyond The C, a neural decompiler that is easily adaptable to multiple programming languages by treating source and assembly code as plain text, reducing the need for language-specific tools.
Contribution
It presents a retargetable neural decompiler that minimizes domain knowledge requirements, enabling easier support for diverse programming languages.
Findings
Achieves comparable decompilation accuracy to existing methods.
Less reliance on language-specific tokenizers and parsers.
Effective across multiple languages like Go, Fortran, OCaml, and C.
Abstract
The problem of reversing the compilation process, decompilation, is an important tool in reverse engineering of computer software. Recently, researchers have proposed using techniques from neural machine translation to automate the process in decompilation. Although such techniques hold the promise of targeting a wider range of source and assembly languages, to date they have primarily targeted C code. In this paper we argue that existing neural decompilers have achieved higher accuracy at the cost of requiring language-specific domain knowledge such as tokenizers and parsers to build an abstract syntax tree (AST) for the source language, which increases the overhead of supporting new languages. We explore a different tradeoff that, to the extent possible, treats the assembly and source languages as plain text, and show that this allows us to build a decompiler that is easily…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
