Operational Data Analytics in Practice: Experiences from Design to Deployment in Production HPC Environments
Alessio Netti, Michael Ott, Carla Guillen, Daniele Tafani, Martin, Schulz

TL;DR
This paper shares practical experiences of implementing Operational Data Analytics in production HPC environments, focusing on control and visualization, and highlights insights to advance community understanding.
Contribution
It provides a comprehensive account of deploying open-source ODA solutions in real HPC systems, bridging research and practical application.
Findings
Successful control of cooling infrastructures using ODA
Effective visualization of job data in production environments
Open-source tools enable generic and adaptable ODA frameworks
Abstract
As HPC systems grow in complexity, efficient and manageable operation is increasingly critical. Many centers are thus starting to explore the use of Operational Data Analytics (ODA) techniques, which extract knowledge from massive amounts of monitoring data and use it for control and visualization purposes. As ODA is a multi-faceted problem, much effort has gone into researching its separate aspects: however, accounts of production ODA experiences are still hard to come across. In this work we aim to bridge the gap between ODA research and production use by presenting our experiences with ODA in production, involving in particular the control of cooling infrastructures and visualization of job data on two HPC systems. We cover the entire development process, from design to deployment, highlighting our insights in an effort to drive the community forward. We rely on open-source tools,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Cloud Computing and Resource Management · Graph Theory and Algorithms
