DALEQ -- Explainable Equivalence for Java Bytecode
Jens Dietrich, Behnaz Hassanshahi

TL;DR
Daleq is a tool that disassembles Java bytecode into a relational database, normalizes it with datalog rules, and infers equivalence between binaries with provenance, reducing manual effort and outperforming existing tools.
Contribution
Daleq introduces a novel approach using relational databases and datalog rules for explainable equivalence analysis of Java bytecode, enhancing accuracy and transparency.
Findings
Reduces manual effort in binary equivalence assessment.
Outperforms existing tools in identifying equivalent artifacts.
Scales to large industrial datasets with thousands of class pairs.
Abstract
The security of software builds has attracted increased attention in recent years in response to incidents like solarwinds and xz. Now, several companies including Oracle and Google rebuild open source projects in a secure environment and publish the resulting binaries through dedicated repositories. This practice enables direct comparison between these rebuilt binaries and the original ones produced by developers and published in repositories such as Maven Central. These binaries are often not bitwise identical; however, in most cases, the differences can be attributed to variations in the build environment, and the binaries can still be considered equivalent. Establishing such equivalence, however, is a labor-intensive and error-prone process. While there are some tools that can be used for this purpose, they all fall short of providing provenance, i.e. readable explanation of why…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Scientific Computing and Data Management · Advanced Malware Detection Techniques
