On the Variability of Source Code in Maven Package Rebuilds
Jens Dietrich, Behnaz Hassanshahi

TL;DR
This paper investigates the variability in source code used for rebuilding Maven packages, revealing that build-time code generation causes non-equivalence between original and independently rebuilt packages, impacting security practices.
Contribution
It provides an empirical analysis of source code differences in Maven package rebuilds and identifies build-time code generation as a key cause of variability.
Findings
Build-time code generation leads to source code non-equivalence.
Non-equivalent sources are common in alternative package builds.
Strategies are proposed to mitigate build variability issues.
Abstract
Rebuilding packages from open source is a common practice to improve the security of software supply chains, and is now done at an industrial scale. The basic principle is to acquire the source code used to build a package published in a repository such as Maven Central (for Java), rebuild the package independently with hardened security, and publish it in some alternative repository. In this paper we test the assumption that the same source code is being used by those alternative builds. To study this, we compare the sources released with packages on Maven Central, with the sources associated with independently built packages from Google's Assured Open Source and Oracle's Build-from-Source projects. We study non-equivalent sources for alternative builds of 28 popular packages with 85 releases. We investigate the causes of non-equivalence, and find that the main cause is build…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Open Source Software Innovations · Software Engineering Techniques and Practices
