Predictive Modeling of I/O Performance for Machine Learning Training Pipelines: A Data-Driven Approach to Storage Optimization
Karthik Prabhakar, Durgamadhab Mishra

TL;DR
This paper introduces a machine learning-based method to accurately predict I/O performance in ML training pipelines, enabling optimized storage configurations and reducing setup time from days to minutes.
Contribution
It presents a data-driven approach using regression models, especially XGBoost, to predict I/O throughput and guide storage optimization in ML workflows.
Findings
XGBoost achieved R-squared of 0.991 in I/O throughput prediction.
Feature importance analysis identified throughput metrics and batch size as key performance factors.
The approach reduces configuration time from days to minutes.
Abstract
Modern machine learning training is increasingly bottlenecked by data I/O rather than compute. GPUs often sit idle at below 50% utilization waiting for data. This paper presents a machine learning approach to predict I/O performance and recommend optimal storage configurations for ML training pipelines. We collected 141 observations through systematic benchmarking across different storage backends (NVMe SSD, network-attached storage, in-memory filesystems), data formats, and access patterns, covering both low-level I/O operations and full training pipelines. After evaluating seven regression models and three classification approaches, XGBoost achieved the best performance with R-squared of 0.991, predicting I/O throughput within 11.8% error on average. Feature importance analysis revealed that throughput metrics and batch size are the primary performance drivers. This data-driven…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Cloud Computing and Resource Management
