Automated Localization for Unreproducible Builds
Zhilei Ren, He Jiang, Jifeng Xuan, Zijiang Yang

TL;DR
This paper introduces RepLoc, an automated framework that efficiently localizes problematic files causing unreproducible builds in software packages, significantly reducing manual effort and improving reproducibility verification.
Contribution
RepLoc combines log analysis and heuristic filtering to automatically rank files related to unreproducible builds, advancing automation in reproducibility troubleshooting.
Findings
Achieves 47.09% accuracy with top-ranked file
Reaches 79.28% accuracy considering top ten files
Successfully fixed six unreproducible packages using RepLoc
Abstract
Reproducibility is the ability of recreating identical binaries under pre-defined build environments. Due to the need of quality assurance and the benefit of better detecting attacks against build environments, the practice of reproducible builds has gained popularity in many open-source software repositories such as Debian and Bitcoin. However, identifying the unreproducible issues remains a labour intensive and time consuming challenge, because of the lacking of information to guide the search and the diversity of the causes that may lead to the unreproducible binaries. In this paper we propose an automated framework called RepLoc to localize the problematic files for unreproducible builds. RepLoc features a query augmentation component that utilizes the information extracted from the build logs, and a heuristic rule-based filtering component that narrows the search scope. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software System Performance and Reliability
