STELLAR: Storage Tuning Engine Leveraging LLM Autonomous Reasoning for High Performance Parallel File Systems
Chris Egersdoerfer, Philip Carns, Shane Snyder, Robert Ross, Dong Dai

TL;DR
STELLAR is an autonomous, LLM-powered system that efficiently tunes high-performance parallel file systems, achieving near-optimal configurations rapidly and reducing the complexity and manpower traditionally required for I/O system optimization.
Contribution
This paper introduces STELLAR, a novel LLM-based autonomous tuning system that significantly reduces tuning iterations and automates the entire process for parallel file systems.
Findings
STELLAR finds near-optimal configurations within five attempts.
It outperforms traditional autotuning methods requiring hundreds of iterations.
The system effectively automates complex I/O tuning tasks for scientific computing.
Abstract
I/O performance is crucial to efficiency in data-intensive scientific computing; but tuning large-scale storage systems is complex, costly, and notoriously manpower-intensive, making it inaccessible for most domain scientists. To address this problem, we propose STELLAR, an autonomous tuner for high-performance parallel file systems. Our evaluations show that STELLAR almost always selects near-optimal parameter configurations for parallel file systems within the first five attempts, even for previously unseen applications. STELLAR differs fundamentally from traditional autotuning methods, which often require hundreds of thousands of iterations to converge. Powered by large language models (LLMs), STELLAR enables autonomous end-to-end agentic tuning by (1) accurately extracting tunable parameters from software manuals, (2) analyzing I/O trace logs generated by applications, (3)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Software System Performance and Reliability
