Magpie: Automatically Tuning Static Parameters for Distributed File Systems using Deep Reinforcement Learning
Houkun Zhu, Dominik Scheinert, Lauritz Thamsen, Kordian Gontarska, and, Odej Kao

TL;DR
Magpie uses deep reinforcement learning to automatically tune static configuration parameters in distributed file systems, significantly improving performance without requiring expert intervention.
Contribution
This paper introduces Magpie, a novel deep reinforcement learning approach for automatic static parameter tuning in distributed file systems, leveraging system metrics for performance optimization.
Findings
Achieves 91.8% throughput gains on Lustre after tuning.
Reaches 39.7% more throughput gains compared to baseline.
Effectively tunes static parameters without system restarts.
Abstract
Distributed file systems are widely used nowadays, yet using their default configurations is often not optimal. At the same time, tuning configuration parameters is typically challenging and time-consuming. It demands expertise and tuning operations can also be expensive. This is especially the case for static parameters, where changes take effect only after a restart of the system or workloads. We propose a novel approach, Magpie, which utilizes deep reinforcement learning to tune static parameters by strategically exploring and exploiting configuration parameter spaces. To boost the tuning of the static parameters, our method employs both server and client metrics of distributed file systems to understand the relationship between static parameters and performance. Our empirical evaluation results show that Magpie can noticeably improve the performance of the distributed file system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Cloud Computing and Resource Management · Distributed and Parallel Computing Systems
