Pipit: Scripting the analysis of parallel execution traces
Abhinav Bhatele, Rakrish Dhakal, Alexander Movsesyan, Aditya K., Ranjan, Onur Cankur

TL;DR
Pipit is a Python library that simplifies the analysis of parallel execution traces by providing a unified, extensible framework for data manipulation and performance issue detection across multiple trace formats.
Contribution
It introduces Pipit, a novel Python-based tool that unifies trace analysis for various formats, enabling automated, scalable, and customizable performance analysis of parallel programs.
Findings
Supports multiple trace formats like OTF2, HPCToolkit, Nsight.
Provides operations for data aggregation, filtering, and transformation.
Facilitates automated detection of performance issues.
Abstract
Performance analysis is a critical step in the oft-repeated, iterative process of performance tuning of parallel programs. Per-process, per-thread traces (detailed logs of events with timestamps) enable in-depth analysis of parallel program execution to identify different kinds of performance issues. Often times, trace collection tools provide a graphical tool to analyze the trace output. However, these GUI-based tools only support specific file formats, are challenging to scale to large trace sizes, limit data exploration to the implemented graphical views, and do not support automated comparisons of two or more datasets. In this paper, we present a programmatic approach to analyzing parallel execution traces by leveraging pandas, a powerful Python-based data analysis library. We have developed a Python library, Pipit, on top of pandas that can read traces in different file formats…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Software System Performance and Reliability · Cloud Computing and Resource Management
