BoPF: Mitigating the Burstiness-Fairness Tradeoff in Multi-Resource Clusters
Tan N. Le, Xiao Sun, Mosharaf Chowdhury, Zhenhua Liu

TL;DR
BoPF is a novel scheduler that balances short-term latency guarantees with long-term fairness in multi-resource clusters, achieving high resource utilization and improved performance for diverse workloads.
Contribution
Introduces Bounded Priority Fairness (BoPF), the first scheduler to simultaneously ensure long-term fairness, burst guarantees, and Pareto efficiency in multi-resource scheduling.
Findings
BoPF speeds up latency-sensitive jobs by 5.38x compared to DRF.
BoPF improves throughput-sensitive job completion times by up to 3.05x.
BoPF closely matches the performance of Strict Priority and fairness of DRF.
Abstract
Simultaneously supporting latency- and throughout-sensitive workloads in a shared environment is an increasingly more common challenge in big data clusters. Despite many advances, existing cluster schedulers force the same performance goal - fairness in most cases - on all jobs. Latency-sensitive jobs suffer, while throughput-sensitive ones thrive. Using prioritization does the opposite: it opens up a path for latency-sensitive jobs to dominate. In this paper, we tackle the challenges in supporting both short-term performance and long-term fairness simultaneously with high resource utilization by proposing Bounded Priority Fairness (BoPF). BoPF provides short-term resource guarantees to latency-sensitive jobs and maintains long-term fairness for throughput-sensitive jobs. BoPF is the first scheduler that can provide long-term fairness, burst guarantee, and Pareto efficiency in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Distributed and Parallel Computing Systems
