TL;DR
This paper introduces an incremental multiresolution dynamic mode decomposition method for analyzing large-scale HPC system data, enabling real-time insights into system behavior across multiple fidelity levels.
Contribution
It presents a novel incremental implementation of mrDMD that efficiently processes massive HPC monitoring data and visualizes system patterns in real-time.
Findings
Effective analysis of terabyte-scale HPC logs
Real-time visualization of system behavior
Application demonstrated on Cray XC40 supercomputer
Abstract
With the growing complexity in architecture and the size of large-scale computing systems, monitoring and analyzing system behavior and events has become daunting. Monitoring data amounting to terabytes per day are collected by sensors housed in these massive systems at multiple fidelity levels and varying temporal resolutions. In this work, we develop an incremental version of multiresolution dynamic mode decomposition (mrDMD), which converts high-dimensional data to spatial-temporal patterns at varied frequency ranges. Our incremental implementation of the mrDMD algorithm (I-mrDMD) promptly reveals valuable information in the massive environment log dataset, which is then visually aligned with the processed hardware and job log datasets through our generalizable rack visualization using D3 visualization integrated into the Jupyter Notebook interface. We demonstrate the efficacy of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
