TX-Digital Twin: Visualizing Supercomputer GPU Performance Data Stream
Elena Baskakova, William Bergeron, Matthew Hubbell, Hayden Jananthan, and Jeremy Kepner

TL;DR
This paper introduces an enhanced 3D visualization tool, TX-Digital Twin, for supercomputer GPU performance data, integrating new GPU metrics with minimal performance impact.
Contribution
We extend the TX-Digital Twin by integrating GPU metrics visualization, improving supercomputer monitoring with optimized rendering techniques.
Findings
Successfully integrated GPU metrics like memory, temperature, and power into the visualization.
Maintained minimal performance overhead despite added visualization complexity.
Enhanced monitoring capabilities for GPU-accelerated supercomputers.
Abstract
Supercomputers are complex, dynamic systems that serve thousands of users and are built with thousands of compute nodes. Due to the vast amounts of system and performance data needed to accurately capture their status, supercomputers require complex methods to monitor, maintain, and optimize. Data visualization is a powerful technique for overseeing these large streams of data in an easily interpretable way. The MIT Lincoln Laboratory Supercomputing Center (LLSC) enables effective monitoring through combining 3D gaming technology with compound data streams in the TX-Digital Twin, a 3D simulation of the supercomputer. The TX-Digital Twin offers both live and historical data, in visual and text formats, and tracks a multitude of revealing performance metrics. Recent increasing interest in GPU-accelerated computing has driven a need for monitoring and maintenance of GPU-accelerated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
