TL;DR
This paper evaluates the diversity of Java decompilers and introduces Arlecchino, a new decompiler that combines existing strategies to improve code recovery from bytecode, handling cases no single decompiler can.
Contribution
It assesses the strategies of eight Java decompilers, identifies their limitations, and proposes Arlecchino, a novel meta-decompiler that merges partial outputs to enhance decompilation coverage.
Findings
No single decompiler handles all bytecode structures effectively.
The best decompiler achieves 84% syntactic correctness and 78% semantic equivalence.
Arlecchino handles 37.6% of classes previously unsupported by existing decompilers.
Abstract
During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, decompilation, which aims at producing source code from bytecode, relies on strategies to reconstruct the information that has been lost. Different Java decompilers use distinct strategies to achieve proper decompilation. In this work, we hypothesize that the diverse ways in which bytecode can be decompiled has a direct impact on the quality of the source code produced by decompilers. In this paper, we assess the strategies of eight Java decompilers with respect to three quality indicators: syntactic correctness, syntactic distortion and semantic equivalence modulo inputs. Our results show that no single modern decompiler is able to correctly handle the variety of bytecode structures coming from real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
