Understanding performance variability in standard and pipelined parallel Krylov solvers
Hannah Morgan, Patrick Sanan, Matthew G. Knepley, Richard, Tran Mills

TL;DR
This paper investigates the performance variability of Krylov solvers caused by machine noise, demonstrating that pipelined algorithms reduce variability and proposing an improved non-stationary performance model for better prediction.
Contribution
It introduces an enhanced nondeterministic performance model accounting for iteration fluctuations, supported by extensive empirical data across multiple platforms.
Findings
Large variability in Krylov iterations across nodes for standard methods
Pipelined algorithms significantly reduce performance variability
The updated model accurately predicts observed performance fluctuations
Abstract
In this work, we collect data from runs of Krylov subspace methods and pipelined Krylov algorithms in an effort to understand and model the impact of machine noise and other sources of variability on performance. We find large variability of Krylov iterations between compute nodes for standard methods that is reduced in pipelined algorithms, directly supporting conjecture, as well as large variation between statistical distributions of runtimes across iterations. Based on these results, we improve upon a previously introduced nondeterministic performance model by allowing iterations to fluctuate over time. We present our data from runs of various Krylov algorithms across multiple platforms as well as our updated non-stationary model that provides good agreement with observations. We also suggest how it can be used as a predictive tool.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
