Prediction for Distributional Outcomes in High-Performance Computing I/O Variability
Li Xu, Yili Hong, Max D. Morris, Kirk W. Cameron

TL;DR
This paper introduces a modified Gaussian process model to accurately predict the distribution of I/O throughput variability in HPC systems, incorporating system factors and monotonic constraints, outperforming existing methods.
Contribution
The paper presents a novel distribution prediction framework using a constrained Gaussian process that handles both quantitative and qualitative inputs for HPC performance variability.
Findings
The proposed model accurately predicts I/O throughput distributions.
It outperforms existing distribution prediction methods.
The model can generate scalar summaries like mean and quantiles from the predicted distributions.
Abstract
Although high-performance computing (HPC) systems have been scaled to meet the exponentially-growing demand for scientific computing, HPC performance variability remains a major challenge and has become a critical research topic in computer science. Statistically, performance variability can be characterized by a distribution. Predicting performance variability is a critical step in HPC performance variability management and is nontrivial because one needs to predict a distribution function based on system factors. In this paper, we propose a new framework to predict performance distributions. The proposed model is a modified Gaussian process that can predict the distribution function of the input/output (I/O) throughput under a specific HPC system configuration. We also impose a monotonic constraint so that the predicted function is nondecreasing, which is a property of the cumulative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Cloud Computing and Resource Management · Distributed and Parallel Computing Systems
