DCoPilot: Generative AI-Empowered Policy Adaptation for Dynamic Data Center Operations
Minghao Li, Ruihang Wang, Rui Tan, Yonggang Wen

TL;DR
DCoPilot introduces a hybrid generative framework combining large language models and hypernetworks to enable real-time, adaptive control policies for energy-efficient and safe data center operations amid rapid workload changes.
Contribution
The paper presents a novel hybrid approach that integrates LLMs and hypernetworks for dynamic policy generation, addressing the limitations of manual DRL design in data centers.
Findings
Achieves near-zero constraint violations in diverse control tasks.
Outperforms baseline methods across various specification changes.
Validates the effectiveness of LLM-based reward generation for stable policy adaptation.
Abstract
Modern data centers (DCs) hosting artificial intelligence (AI)-dedicated devices operate at high power densities with rapidly varying workloads, making minute-level adaptation essential for safe and energy-efficient operation. However, manually designing piecewise deep reinforcement learning (DRL) agents cannot keep pace with frequent dynamics shifts and service-level agreement (SLA) changes of an evolving DC. This specification-to-policy lag causes a lack of timely, effective control policies, which may lead to service outages. To bridge the gap, we present DCoPilot, a hybrid framework for generative control policies in dynamic DC operation. DCoPilot synergizes two distinct generative paradigms, i.e., a large language model (LLM) that performs symbolic generation of structured reward forms, and a hypernetwork that conducts parametric generation of policy weights. DCoPilot operates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Software System Performance and Reliability · Software-Defined Networks and 5G
