Reproducibility and Performance: Why Choose?
Ludovic Court\`es (SED)

TL;DR
This paper explores reconciling high-performance computing with reproducibility by introducing package multi-versioning, enabling CPU tuning without sacrificing scientific verifiability.
Contribution
It proposes package multi-versioning for GNU Guix to enable CPU tuning while maintaining reproducibility and provenance tracking in HPC environments.
Findings
Package multi-versioning allows CPU tuning without losing reproducibility.
Performance portability techniques have been effective in message passing (MPI).
The approach enhances scientific verifiability in high-performance computing.
Abstract
Research processes often rely on high-performance computing (HPC), but HPC is often seen as antithetical to "reproducibility": one would have to choose between software that achieves high performance, and software that can be deployed in a reproducible fashion. However, by giving up on reproducibility we would give up on verifiability, a foundation of the scientific process. How can we conciliate performance and reproducibility? This article looks at two performance-critical aspects in HPC: message passing (MPI) and CPU micro-architecture tuning. Engineering work that has gone into performance portability has already proved fruitful, but some areas remain unaddressed when it comes to CPU tuning. We propose package multi-versioning, a technique developed for GNU Guix, a tool for reproducible software deployment, and show that it allows us to implement CPU tuning without compromising on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Cloud Computing and Resource Management · Distributed and Parallel Computing Systems
