Power- and Fragmentation-aware Online Scheduling for GPU Datacenters
Francesco Lettich, Emanuele Carlini, Franco Maria Nardini, Raffaele, Perego, Salvatore Trani

TL;DR
This paper introduces PWR, a new online scheduling policy for GPU datacenters that balances power efficiency and GPU fragmentation reduction, validated through extensive simulation experiments.
Contribution
It proposes PWR, a novel scheduling policy that minimizes power consumption while working alongside FGD to reduce GPU fragmentation in datacenters.
Findings
PWR effectively reduces power usage in GPU datacenters.
Combining PWR with FGD balances power efficiency and resource fragmentation.
Experimental results demonstrate improved operational efficiency.
Abstract
The rise of Artificial Intelligence and Large Language Models is driving increased GPU usage in data centers for complex training and inference tasks, impacting operational costs, energy demands, and the environmental footprint of large-scale computing infrastructures. This work addresses the online scheduling problem in GPU datacenters, which involves scheduling tasks without knowledge of their future arrivals. We focus on two objectives: minimizing GPU fragmentation and reducing power consumption. GPU fragmentation occurs when partial GPU allocations hinder the efficient use of remaining resources, especially as the datacenter nears full capacity. A recent scheduling policy, Fragmentation Gradient Descent (FGD), leverages a fragmentation metric to address this issue. Reducing power consumption is also crucial due to the significant power demands of GPUs. To this end, we propose PWR, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques
MethodsFocus · Fragmentation
