Preparing for Performance Analysis at Exascale
Jonathon Anderson, Yumeng Liu, John Mellor-Crummey

TL;DR
This paper introduces a streaming aggregation method for analyzing large-scale, sparse performance data from exascale heterogeneous systems, significantly improving analysis speed and data compactness.
Contribution
It presents a novel parallel postmortem analysis approach that efficiently handles sparse, heterogeneous performance measurements at exascale, outperforming existing tools in speed and data size.
Findings
Analyzes large-scale GPU-accelerated applications faster than HPCToolkit.
Produces sparse performance profiles that are much smaller than dense representations.
Achieves over an order of magnitude speedup in performance analysis.
Abstract
Performance tools for emerging heterogeneous exascale platforms must address two principal challenges when analyzing execution measurements. First, measurement of large-scale executions may record mountains of performance data. Second, performance measurements for parallel programs are sparse in two ways: the set of metrics present for any context and the set of contexts present in different threads. For GPU-accelerated applications, an important source of sparsity is that none of the myriad of GPU metrics apply to any of the many CPU contexts. To address these challenges, we developed a novel streaming aggregation approach to postmortem analysis that employs both shared and distributed memory parallelism to aggregate sparse performance measurements from every rank, thread, and GPU stream of an application, and attributes heterogeneous call path profiles and traces to source code. Using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Software System Performance and Reliability · Parallel Computing and Optimization Techniques
