Cross-layer Visualization and Profiling of Network and I/O Communication for HPC Clusters
Pouya Kousha, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda

TL;DR
This paper introduces a holistic, real-time visualization and profiling tool for HPC communication stacks, enabling better understanding of cross-layer interactions and I/O behavior to identify bottlenecks.
Contribution
The paper presents a novel cross-layer visualization method and a low-overhead I/O profiling approach integrated into HPC communication libraries.
Findings
Holistic visualization provides real-time insights into HPC communication.
Cross-stack analysis reveals correlations between I/O traffic and MPI communication.
The approach effectively detects communication bottlenecks in HPC systems.
Abstract
Understanding and visualizing the full-stack performance trade-offs and interplay between HPC applications, MPI libraries, the communication fabric, and the file system is a challenging endeavor. Designing a holistic profiling and visualization method for HPC communication networks is challenging since different levels of communication coexist and interact with each other on the communication fabric. A breakdown of traffic is essential to understand the interplay of different layers along with the application's communication behavior without losing a general view of network traffic. Unfortunately, existing profiling tools are disjoint and either focus on only profiling and visualizing a few levels of the HPC stack, which limits the insights they can provide, or they provide extremely detailed information which necessitates a steep learning curve to understand. We target our profiling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Peer-to-Peer Network Technologies · Cloud Computing and Resource Management
