Spatiotemporal Modeling of Node Temperatures in Supercomputers
Curtis B Storlie, Brian J Reich, William N Rust, Lawrence O Ticknor,, Amanda M Bonnie, Andrew J Montoya, Sarah E Michalak

TL;DR
This paper develops a statistical spatiotemporal model using Gaussian processes and GMRFs to analyze and optimize node temperature management in supercomputing clusters, aiming to reduce cooling costs and prevent overheating.
Contribution
It introduces a novel combination of distribution modeling and Gaussian Markov random fields for detailed temperature analysis in supercomputers.
Findings
Identified causes of overheating episodes.
Provided a framework for predicting temperature trends.
Enabled assessment of cooling efficiency improvements.
Abstract
Los Alamos National Laboratory (LANL) is home to many large supercomputing clusters. These clusters require an enormous amount of power (~500-2000 kW each), and most of this energy is converted into heat. Thus, cooling the components of the supercomputer becomes a critical and expensive endeavor. Recently a project was initiated to optimize the cooling system used to cool one of the rooms housing three of these large clusters and develop a general good-practice procedure for reducing cooling costs and monitoring other machine rooms. This work focuses on the statistical approach used to quantify the effect that several cooling changes to the room had on the temperatures of the individual nodes of the computers. The largest cluster in the room has 1600 nodes that run a variety of jobs during general use. Since extremes temperatures are important, a Normal distribution plus generalized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Gaussian Processes and Bayesian Inference
