LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers
Avisek Naug, Antonio Guillen, Vineet Kumar, Scott Greenwood, Wesley Brewer, Sahand Ghorbanpour, Ashwin Ramesh Babu, Vineet Gundecha, Ricardo Luna Gutierrez, Soumyendu Sarkar

TL;DR
LC-Opt introduces a comprehensive benchmark environment for reinforcement learning-based control of liquid cooling in data centers, enabling energy-efficient thermal management through high-fidelity modeling and multi-agent strategies.
Contribution
It provides a detailed digital twin platform for RL control in liquid cooling, supporting multi-objective optimization, interpretability, and novel LLM-based explanation methods.
Findings
Benchmarking of centralized and decentralized RL approaches.
Policy distillation into interpretable decision trees.
Exploration of LLM-based natural language explanations.
Abstract
Liquid cooling is critical for thermal management in high-density data centers with the rising AI workloads. However, machine learning-based controllers are essential to unlock greater energy efficiency and reliability, promoting sustainability. We present LC-Opt, a Sustainable Liquid Cooling (LC) benchmark environment, for reinforcement learning (RL) control strategies in energy-efficient liquid cooling of high-performance computing (HPC) systems. Built on the baseline of a high-fidelity digital twin of Oak Ridge National Lab's Frontier Supercomputer cooling system, LC-Opt provides detailed Modelica-based end-to-end models spanning site-level cooling towers to data center cabinets and server blade groups. RL agents optimize critical thermal controls like liquid supply temperature, flow rate, and granular valve actuation at the IT cabinet level, as well as cooling tower (CT) setpoints…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsCloud Computing and Resource Management · Heat Transfer and Optimization · Modeling and Simulation Systems
