Coordinated Cooling and Compute Management for AI Datacenters
Nardos Belay Abera, Yize Chen

TL;DR
This paper introduces a joint cooling and compute management framework for AI datacenters that optimizes GPU performance and thermal constraints to enhance energy efficiency during large-scale AI inference tasks.
Contribution
It develops a novel hierarchical control framework that co-optimizes GPU compute parameters and cooling strategies based on workload and thermal models.
Findings
Improves energy efficiency of AI datacenters during inference.
Balances latency and thermal constraints effectively.
Demonstrates benefits using real Azure inference traces.
Abstract
The AI datacenters are currently being deployed on a large scale to support the training and deployment of power-intensive large-language models (LLMs). Extensive amount of computation and cooling required in datacenters increase concerns about the energy use and carbon emissions of AI datacenters. Although current state-of-the-art has examined the energy efficiency of LLM inference, most prior research focused on optimizing compute-side scheduling without considering thermal objectives or constraints. Since GPU-intensive inference generates substantial heat that can degrade datacenter performance, ignoring thermal effects can increase total energy consumption and reduce the efficiency of LLM serving. To fill this gap, we profile the characteristics of GPU servers under varying cooling and AI jobs, and develop a joint cooling and computing modeling approach for AI datacenters. Built…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Parallel Computing and Optimization Techniques · IoT and Edge/Fog Computing
