KML: Using Machine Learning to Improve Storage Systems
Ibrahim Umit Akgun, Ali Selman Aydin, Andrew Burford, Michael McNeill,, Michael Arkhangelskiy, Aadil Shaikh, Lukas Velikov, and Erez Zadok

TL;DR
KML introduces a machine learning-based architecture integrated into operating systems to dynamically optimize storage parameters, significantly improving I/O throughput with minimal overhead across diverse workloads.
Contribution
This paper presents the novel KML architecture that replaces manual heuristics with machine learning for dynamic storage system optimization in OSs.
Findings
KML improves I/O throughput by up to 2.3x and 15x in case studies.
KML consumes less than 4KB kernel memory and has CPU overhead below 0.2%.
KML effectively adapts to complex, unseen workloads across different storage devices.
Abstract
Operating systems include many heuristic algorithms designed to improve overall storage performance and throughput. Because such heuristics cannot work well for all conditions and workloads, system designers resorted to exposing numerous tunable parameters to users -- thus burdening users with continually optimizing their own storage systems and applications. Storage systems are usually responsible for most latency in I/O-heavy applications, so even a small latency improvement can be significant. Machine learning (ML) techniques promise to learn patterns, generalize from them, and enable optimal solutions that adapt to changing workloads. We propose that ML solutions become a first-class component in OSs and replace manual heuristics to optimize storage systems dynamically. In this paper, we describe our proposed ML architecture, called KML. We developed a prototype KML architecture and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Cloud Computing and Resource Management
