TL;DR
This paper statistically analyzes IO500 benchmark submissions to reveal detailed performance patterns and behaviors in HPC storage systems, uncovering insights hidden in aggregate scores.
Contribution
It provides a comprehensive statistical characterization of IO500 submissions, highlighting detailed performance patterns and behaviors in HPC storage systems.
Findings
Scores span four orders of magnitude.
Strong within-domain score clustering observed.
File-system-specific patterns in overhead and load imbalance identified.
Abstract
The IO500 benchmark has become the community standard for evaluating HPC storage system performance, yet the detailed data contained in its submission packages remains largely unexplored beyond aggregate leaderboard rankings. We present a statistical characterization of 61 IO500 submissions from four competition lists (ISC21 through SC22), examining score distributions, inter-phase correlations, and insights derived from detailed log files that accompany each submission. Our analysis reveals that IO500 scores span four orders of magnitude. Spearman correlation analysis shows strong within-domain clustering for both bandwidth (rs = 0.78 to 0.96) and metadata (rs = 0.89 to 0.98) phases, with the composite sub-scores exhibiting rs = 0.92 at per-node level (Pearson r = 0.53). Log-level analysis uncovers file-system-specific patterns in IOR close-time overhead, straggler behavior during the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
