Chopper: A Multi-Level GPU Characterization Tool & Derived Insights Into LLM Training Inefficiency
Marco Kurzynski, Shaizeen Aga, Di Wu

TL;DR
Chopper is a comprehensive GPU profiling framework that reveals key bottlenecks and inefficiencies in large language model training on AMD GPUs, providing insights for optimization and future system design.
Contribution
We introduce Chopper, a novel multi-granularity profiling tool for GPU-based LLM training, and provide the first detailed characterization of Llama 3 8B training on AMD Instinct GPUs.
Findings
Memory determinism enables higher GPU and memory frequencies.
Frequency overhead (DVFS effects) is the largest source of inefficiency.
Identified bottlenecks inform optimization and hardware design improvements.
Abstract
Training large language models (LLMs) efficiently requires a deep understanding of how modern GPU systems behave under real-world distributed training workloads. While prior work has focused primarily on kernel-level performance or single-GPU microbenchmarks, the complex interaction between communication, computation, memory behavior, and power management in multi-GPU LLM training remains poorly characterized. In this work, we introduce Chopper, a profiling and analysis framework that collects, aligns, and visualizes GPU kernel traces and hardware performance counters across multiple granularities (i.e., from individual kernels to operations, layers, phases, iterations, and GPUs). Using Chopper, we perform a comprehensive end-to-end characterization of Llama 3 8B training under fully sharded data parallelism (FSDP) on an eight-GPU AMD InstinctTM MI300X node. Our analysis reveals several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Big Data and Digital Economy
