PCLVis: Visual Analytics of Process Communication Latency in Large-Scale Simulation
Chongke Bi, Xin Gao, Baofeng Fu, Yuheng Zhao, Siming Chen, Ying Zhao, Lu Yang

TL;DR
PCLVis is a visual analytics framework that helps users analyze process communication latency in large-scale simulations using MPI data, enabling better understanding and optimization of communication events without needing physical link layer info.
Contribution
The paper introduces PCLVis, a novel framework that analyzes process communication latency using MPI data, with new methods for event localization, propagation analysis, and interactive visualization.
Findings
Effective analysis of PCL events on supercomputers.
Improved simulation efficiency through PCL event insights.
Demonstrated success on TH-1A supercomputer.
Abstract
Large-scale simulations on supercomputers have become important tools for users. However, their scalability remains a problem due to the huge communication cost among parallel processes. Most of the existing communication latency analysis methods rely on the physical link layer information, which is only available to administrators. In this paper, a framework called PCLVis is proposed to help general users analyze process communication latency (PCL) events. Instead of the physical link layer information, the PCLVis uses the MPI process communication data for the analysis. First, a spatial PCL event locating method is developed. All processes with high correlation are classified into a single cluster by constructing a process-correlation tree. Second, the propagation path of PCL events is analyzed by constructing a communication-dependency-based directed acyclic graph (DAG), which can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
