PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization
Kelun Lei, Hailong Yang, Huaitao Zhang, Xin You, Kaige Zhang, Zhongzhi Luan, Yi Liu, Depei Qian

TL;DR
PRAGMA is an AI-driven framework that combines hardware profiling and execution feedback to automatically optimize kernel performance, significantly improving speedups on CPU and GPU compared to existing methods.
Contribution
It introduces a novel profile-guided AI kernel generation framework that enables reasoning about performance bottlenecks and iterative refinement, advancing automated kernel optimization.
Findings
Achieves 2.81× speedup on CPU and 2.30× on GPU over baseline.
Outperforms AIKG without profiling in all tested cases.
Effectively identifies and addresses low-level performance bottlenecks.
Abstract
Designing high-performance kernels requires expert-level tuning and a deep understanding of hardware characteristics. Recent advances in large language models (LLMs) have enabled automated kernel generation, yet most existing systems rely solely on correctness or execution time feedback, lacking the ability to reason about low-level performance bottlenecks. In this paper, we introduce PRAGMA, a profile-guided AI kernel generation framework that integrates execution feedback and fine-grained hardware profiling into the reasoning loop. PRAGMA enables LLMs to identify performance bottlenecks, preserve historical best versions, and iteratively refine code quality. We evaluate PRAGMA on KernelBench, covering GPU and CPU backends. Results show that PRAGMA consistently outperforms baseline AIKG without profiling enabled and achieves 2.81 and 2.30 averaged speedups against Torch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Advanced Neural Network Applications
