BandPilot: Towards Performance- and Contention-Aware GPU Dispatching in AI Clusters
Kunming Zhang, Hanlong Liao, Junyu Xue, Deke Guo, Guoming Tang

TL;DR
BandPilot is a novel GPU dispatching method for AI clusters that learns bandwidth models and predicts contention, significantly improving communication efficiency over traditional topology-based heuristics.
Contribution
It introduces a data-efficient bandwidth modeling and contention-aware dispatching approach that outperforms existing static heuristics in multi-tenant AI clusters.
Findings
Achieves 92-97% bandwidth efficiency in experiments.
Improves average efficiency by 20-40% over topology-compactness heuristics.
Effective in heterogeneous and simulated environments.
Abstract
Modern multi-tenant AI clusters are increasingly communication-bound, driven by high-volume and multi-round GPU-to-GPU collective communication. Consequently, the GPU dispatcher's choice of a physical GPU subset for each tenant largely determines the job's effective collective bandwidth and thus its performance ceiling. Existing dispatchers predominantly rely on static, topology-aware heuristics that prioritize GPU resource compactness, assuming that minimizing physical distance maximizes communication bandwidth. However, we reveal that this assumption often fails due to complex system-level bottlenecks, such as non-linear NIC saturation and inter-node link heterogeneity.This paper presents BandPilot, a performance- and contention-aware GPU dispatching primitive that optimizes effective collective bandwidth for multi-tenant AI clusters. Specifically, BandPilot learns a data-efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
