Traveler: Navigating Task Parallel Traces for Performance Analysis
Sayef Azad Sakin, Alex Bigelow, R. Tohid, Connor Scully-Allison,, Carlos Scheidegger, Steven R. Brandt, Christopher Taylor, Kevin A. Huck,, Hartmut Kaiser, Katherine E. Isaacs

TL;DR
Traveler is a visualization platform that helps high performance computing developers explore complex task parallel execution traces, enabling the discovery of performance issues and unknown behaviors through multi-faceted navigation tools.
Contribution
The paper introduces Traveler, an integrated visualization system with hierarchical navigation tailored for analyzing large, complex task parallel traces in high performance computing.
Findings
Supported performance analysis tasks effectively.
Enabled discovery of previously unknown behaviors.
Received positive user feedback and case study validation.
Abstract
Understanding the behavior of software in execution is a key step in identifying and fixing performance issues. This is especially important in high performance computing contexts where even minor performance tweaks can translate into large savings in terms of computational resource use. To aid performance analysis, developers may collect an execution trace - a chronological log of program activity during execution. As traces represent the full history, developers can discover a wide array of possibly previously unknown performance issues, making them an important artifact for exploratory performance analysis. However, interactive trace visualization is difficult due to issues of data size and complexity of meaning. Traces represent nanosecond-level events across many parallel processes, meaning the collected data is often large and difficult to explore. The rise of asynchronous task…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Peer-to-Peer Network Technologies
