Charliecloud's layer-free, Git-based container build cache
Reid Priedhorsky (1), Jordan Ogas (1), Claude H. (Rusty) Davis IV (1),, Z. Noah Hounshel (1, 2), Ashlyn Lee (1, 3), Benjamin Stormer (1, 4),, R. Shane Goff (1) ((1) Los Alamos National Laboratory, (2) University of, North Carolina Wilmington, (3) Colorado State University

TL;DR
This paper introduces a Git-based cache for layer-free container builds in HPC, demonstrating it can outperform traditional layered caches in build time, disk usage, and structural efficiency.
Contribution
It presents a novel Git-based caching method for layer-free containers, offering advantages over traditional layered caches in performance and structure.
Findings
Git-based cache outperforms layered cache in build time
Lower cache overhead and better file de-duplication
More efficient diff format for container images
Abstract
A popular approach to deploying scientific applications in high performance computing (HPC) is Linux containers, which package an application and all its dependencies as a single unit. This image is built by interpreting instructions in a machine-readable recipe, which is faster with a build cache that stores instruction results for re-use. The standard approach (used e.g. by Docker and Podman) is a many-layered union filesystem, encoding differences between layers as tar archives. Our experiments show this performs similarly to layered caches on both build time and disk usage, with a considerable advantage for many-instruction recipes. Our approach also has structural advantages: better diff format, lower cache overhead, and better file de-duplication. These results show that a Git-based cache for layer-free container implementations is not only possible but may outperform the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques
