AdapTBF: Decentralized Bandwidth Control via Adaptive Token Borrowing for HPC Storage
Md Hasanur Rashid, Dong Dai

TL;DR
AdapTBF is a decentralized bandwidth control mechanism for HPC storage that adaptively manages I/O bandwidth through token borrowing, improving efficiency and fairness over traditional proportional limits.
Contribution
It introduces a novel adaptive token borrowing algorithm for decentralized bandwidth control in parallel file systems like Lustre.
Findings
AdapTBF improves storage utilization under bursty workloads.
It maintains fairness among applications with diverse I/O patterns.
AdapTBF outperforms strict proportional limits in efficiency and fairness.
Abstract
Modern high-performance computing (HPC) applications run on compute resources but share global storage systems. This design can cause problems when applications consume a disproportionate amount of storage bandwidth relative to their allocated compute resources. For example, an application running on a single compute node can issue many small, random writes and consume excessive I/O bandwidth from a storage server. This can hinder larger jobs that write to the same storage server and are allocated many compute nodes, resulting in significant resource waste. A straightforward solution is to limit each application's I/O bandwidth on storage servers in proportion to its allocated compute resources. This approach has been implemented in parallel file systems using Token Bucket Filter (TBF). However, strict proportional limits often reduce overall I/O efficiency because HPC applications…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Cloud Computing and Resource Management
