Software engineering to sustain a high-performance computing scientific application: QMCPACK
William F. Godoy, Steven E. Hahn, Michael M. Walsh, Philip W. Fackler,, Jaron T. Krogel, Peter W. Doak, Paul R. C. Kent, Alfredo A. Correa, Ye Luo,, Mark Dewing

TL;DR
This paper discusses the software engineering strategies implemented in QMCPACK to enhance its sustainability, maintainability, and performance on high-performance computing systems, emphasizing continuous integration, containerization, and code refactoring.
Contribution
It introduces specific software engineering practices applied to QMCPACK, demonstrating their impact on code sustainability and scientific productivity in HPC environments.
Findings
Improved CI coverage for CPU and GPU architectures.
Reduced memory leaks through sanitizers.
Enhanced reproducibility with Docker containers.
Abstract
We provide an overview of the software engineering efforts and their impact in QMCPACK, a production-level ab-initio Quantum Monte Carlo open-source code targeting high-performance computing (HPC) systems. Aspects included are: (i) strategic expansion of continuous integration (CI) targeting CPUs, using GitHub Actions runners, and NVIDIA and AMD GPUs in pre-exascale systems, using self-hosted hardware; (ii) incremental reduction of memory leaks using sanitizers, (iii) incorporation of Docker containers for CI and reproducibility, and (iv) refactoring efforts to improve maintainability, testing coverage, and memory lifetime management. We quantify the value of these improvements by providing metrics to illustrate the shift towards a predictive, rather than reactive, sustainable maintenance approach. Our goal, in documenting the impact of these efforts on QMCPACK, is to contribute to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Distributed and Parallel Computing Systems · Cloud Computing and Resource Management
