It's Not Just Timestamps: A Study on Docker Reproducibility
Oreofe Solarin

TL;DR
This study evaluates Dockerfile reproducibility across GitHub projects, revealing that most are non-reproducible due to developer choices, and proposes guidelines to improve reproducibility.
Contribution
It provides the first large-scale measurement of Docker reproducibility and identifies key developer-controlled factors affecting it, offering actionable Dockerfile guidelines.
Findings
Only 56% of Dockerfiles are buildable.
Just 2.7% are bitwise reproducible without infrastructure configs.
Developer choices like caches and floating versions cause non-reproducibility.
Abstract
Reproducible container builds promise a simple integrity check for software supply chains: rebuild an image from its Dockerfile and compare hashes. We build a Docker measurement pipeline and apply it to a stratified sample of 2,000 GitHub repositories that contained a Dockerfile. We found that only 56% produce any buildable image, and just 2.7% of those are bitwise reproducible without any infrastructure configurations. After modifying infrastructure configurations, we raise bitwise reproducibility by 18.6%, but 78.7% of buildable Dockerfiles remain non-reproducible. We analyze the root causes of the remaining differences, and find that beyond timestamps and metadata, developer-controlled choices such as uncleaned caches, logs, documentation, and floating versions are dominant causes of non-reproducibility. We derive concrete Dockerfile guidelines from these patterns and discuss how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Security and Verification in Computing · Software Testing and Debugging Techniques
