A Digital Twin Framework for Liquid-cooled Supercomputers as Demonstrated at Exascale
Wesley Brewer, Matthias Maiterth, Vineet Kumar, Rafal Wojda, Sedrick, Bouknight, Jesse Hines, Woong Shin, Scott Greenwood, David Grant, Wesley, Williams, and Feiyi Wang

TL;DR
This paper introduces ExaDigiT, an open-source digital twin framework for liquid-cooled supercomputers, enabling system simulation, optimization, and virtual prototyping, demonstrated through a case study of the Frontier supercomputer.
Contribution
It presents the first comprehensive digital twin framework for liquid-cooled exascale supercomputers, integrating multiple modules for detailed system analysis and validation.
Findings
Validated with six months of telemetry data from Frontier
Revealed complex transient cooling dynamics and energy loss mechanisms
Enabled virtual testing of system scenarios and optimizations
Abstract
We present ExaDigiT, an open-source framework for developing comprehensive digital twins of liquid-cooled supercomputers. It integrates three main modules: (1) a resource allocator and power simulator, (2) a transient thermo-fluidic cooling model, and (3) an augmented reality model of the supercomputer and central energy plant. The framework enables the study of "what-if" scenarios, system optimizations, and virtual prototyping of future systems. Using Frontier as a case study, we demonstrate the framework's capabilities by replaying six months of system telemetry for systematic verification and validation. Such a comprehensive analysis of a liquid-cooled exascale supercomputer is the first of its kind. ExaDigiT elucidates complex transient cooling system dynamics, runs synthetic or real workloads, and predicts energy losses due to rectification and voltage conversion. Throughout our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
