TritonForge: Profiling-Guided Framework for Automated Triton Kernel Optimization
Haonan Li, Keyu Man, Partha Kanuparthy, Hanning Chen, Wei Sun, Sreen Tallam, Chenguang Zhu, Kevin Zhu, Zhiyun Qian

TL;DR
TritonForge is an automated framework that uses profiling and iterative code transformation to optimize GPU kernels written in Triton, significantly improving performance with minimal manual effort.
Contribution
It introduces a profiling-guided, automated optimization system for Triton GPU kernels, reducing manual tuning and achieving substantial performance gains.
Findings
Up to 5x performance improvement over baseline kernels
Average success rate of 1.76x across diverse kernel types
Effective identification and mitigation of performance bottlenecks
Abstract
High-performance GPU kernel optimization remains a critical yet labor-intensive task in modern machine learning workloads. Although Triton, a domain-specific language for GPU programming, enables developers to write efficient kernels with concise code, achieving expert-level performance still requires deep understanding of GPU architectures and low-level performance trade-offs. We present TritonForge, a profiling-guided framework for automated Triton kernel optimization. TritonForge integrates kernel analysis, runtime profiling, and iterative code transformation to streamline the optimization process. By incorporating feedback from profiling results, the system identifies performance bottlenecks, proposes targeted code modifications, and evaluates their impact automatically. Across diverse kernel types, TritonForge achieves up to 5x performance improvement over baseline implementations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Machine Learning and Data Classification
