Prediction of High-Performance Computing Input/Output Variability and Its Application to Optimization for System Configurations
Li Xu, Thomas Lux, Tyler Chang, Bo Li, Yili Hong, Layne Watson, Ali, Butt, Danfeng Yao, and Kirk Cameron

TL;DR
This paper develops a statistical framework to predict HPC input/output variability, enabling better system configuration optimization by evaluating various predictive models on large-scale experimental data.
Contribution
It introduces a new data analytic framework and compares multiple methods for predicting HPC I/O variability, addressing a gap in existing statistical approaches.
Findings
Predictive models achieve high accuracy in unseen configurations
Method comparisons reveal strengths and limitations of different approaches
Optimizing configurations based on variability estimates improves HPC performance
Abstract
Performance variability is an important measure for a reliable high performance computing (HPC) system. Performance variability is affected by complicated interactions between numerous factors, such as CPU frequency, the number of input/output (IO) threads, and the IO scheduler. In this paper, we focus on HPC IO variability. The prediction of HPC variability is a challenging problem in the engineering of HPC systems and there is little statistical work on this problem to date. Although there are many methods available in the computer experiment literature, the applicability of existing methods to HPC performance variability needs investigation, especially, when the objective is to predict performance variability both in interpolation and extrapolation settings. A data analytic framework is developed to model data collected from large-scale experiments. Various promising methods are used…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Distributed and Parallel Computing Systems
