TL;DR
This paper evaluates eight Java decompilers on real-world software, revealing that no single tool perfectly reconstructs source code, with the best achieving 84% syntactic correctness and 78% semantic equivalence.
Contribution
It provides a comprehensive empirical analysis of Java decompilers' effectiveness across multiple quality metrics using a large benchmark dataset.
Findings
No decompiler handles all bytecode structures correctly.
Highest-ranked decompiler achieves 84% syntactic correctness.
Semantic equivalence is achieved for 78% of classes.
Abstract
During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, the decompilation process, which aims at producing source code from bytecode, must establish some strategies to reconstruct the information that has been lost. Modern Java decompilers tend to use distinct strategies to achieve proper decompilation. In this work, we hypothesize that the diverse ways in which bytecode can be decompiled has a direct impact on the quality of the source code produced by decompilers. We study the effectiveness of eight Java decompilers with respect to three quality indicators: syntactic correctness, syntactic distortion and semantic equivalence modulo inputs. This study relies on a benchmark set of 14 real-world open-source software projects to be decompiled (2041 classes in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
