TL;DR
PipeWeave is a unified GPU performance prediction framework combining analytical and machine learning models, achieving high accuracy and generalizability across diverse hardware and kernels.
Contribution
It introduces a novel hybrid modeling approach that leverages analytical features with ML to improve GPU performance prediction accuracy and applicability.
Findings
Achieves 6.1% average kernel-level prediction error.
Reduces error of state-of-the-art methods by 6.7x.
Enables optimization of production kernels with up to 1.7x speedup.
Abstract
The rapid expansion of Transformer-based large language models has dramatically increased the need for high-performance GPUs. As a result, there is growing demand for fast, accurate, and widely generalizable GPU performance models to support next-generation hardware selection and system-level exploration. However, current data-driven methods are limited, exhibiting poor generalization across hardware and inadequate modeling of complex production-level kernels common in modern inference stacks. To address these issues, we present PipeWeave, a unified GPU modeling framework. This approach first employs an analytical model to quantify a given kernel's demands on the GPU's heterogeneous instruction pipelines. These analytical features are then fed into a machine learning (ML) model to capture complex cross-pipeline interactions and resource dependencies, enabling high-fidelity performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
