LEO: Tracing GPU Stall Root Causes via Cross-Vendor Backward Slicing
Yuning Xia, John Mellor-Crummey

TL;DR
LEO is a cross-vendor GPU stall root-cause analyzer that uses backward slicing to attribute stalls to source instructions, aiding developers in optimizing GPU performance.
Contribution
The paper introduces LEO, a novel tool for diagnosing GPU stalls across multiple vendors by backward slicing from stalled instructions, improving understanding and optimization.
Findings
LEO achieves 1.73×–1.82× speedups on 21 workloads.
Different GPUs require different optimizations for the same kernel.
LEO's diagnostics enhance code optimization with large language models.
Abstract
More than half of the Top 500 supercomputers employ GPUs as accelerators. On GPU-accelerated platforms, developers face a key diagnostic gap: profilers show source lines where stalls occur, but not why they occur. Furthermore, the same kernel may have different stalls and underlying causes on different GPUs. This paper presents LEO, a root-cause analyzer for NVIDIA, AMD, and Intel GPUs that performs backward slicing from stalled instructions, considering dependencies arising from registers as well as vendor-specific synchronization mechanisms. LEO attributes GPU stalls to source instructions with the goal of explaining root causes of these inefficiencies. Across 21 workloads on three GPU platforms, LEO-guided optimizations deliver geometric-mean speedups of 1.73--1.82. Our case studies show that (1) the same kernel may require different optimizations for different GPU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
