ML-based Modeling to Predict I/O Performance on Different Storage Sub-systems
Yiheng Xu, Pranav Sivaraman, Hariharan Devarajan, Kathryn Mohror,, Abhinav Bhatele

TL;DR
This paper introduces PrismIO, a Python tool that analyzes I/O traces and uses machine learning to predict optimal storage sub-systems, including burst buffers, for large-scale applications, achieving over 94% accuracy.
Contribution
The work presents a novel machine learning approach combined with a Python-based analysis tool to predict suitable storage configurations for I/O-intensive applications.
Findings
Model achieves 94.47% accuracy on unseen IOR scenarios.
Model achieves 95.86% accuracy on four real applications.
PrismIO effectively identifies I/O bottlenecks and performance issues.
Abstract
Parallel applications can spend a significant amount of time performing I/O on large-scale supercomputers. Fast near-compute storage accelerators called burst buffers can reduce the time a processor spends performing I/O and mitigate I/O bottlenecks. However, determining if a given application could be accelerated using burst buffers is not straightforward even for storage experts. The relationship between an application's I/O characteristics (such as I/O volume, processes involved, etc.) and the best storage sub-system for it can be complicated. As a result, adapting parallel applications to use burst buffers efficiently is a trial-and-error process. In this work, we present a Python-based tool called PrismIO that enables programmatic analysis of I/O traces. Using PrismIO, we identify bottlenecks on burst buffers and parallel file systems and explain why certain I/O patterns perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques
