cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

Jinwu Chen; Qidie Wu; Bin Li; Lin Ma; Xin Si; Yang Hu; Shouyi Yin; Jun Yang

arXiv:2512.16465·cs.AI·December 24, 2025

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

Jinwu Chen, Qidie Wu, Bin Li, Lin Ma, Xin Si, Yang Hu, Shouyi Yin, Jun Yang

PDF

Open Access

TL;DR

cuPilot is a multi-agent framework that uses strategy-based representations and advanced algorithms to automatically optimize CUDA kernels, achieving significant speedups and hardware utilization improvements.

Contribution

It introduces a novel strategy-coordinated evolution approach with roofline-guided prompting and population initialization for kernel optimization.

Findings

01

Achieves an average 3.09× speedup over PyTorch on 100 kernels.

02

Demonstrates high hardware utilization on GEMM tasks.

03

Produces open-source optimized kernels.

Abstract

Optimizing CUDA kernels is a challenging and labor-intensive task, given the need for hardware-software co-design expertise and the proprietary nature of high-performance kernel libraries. While recent large language models (LLMs) combined with evolutionary algorithms show promise in automatic kernel optimization, existing approaches often fall short in performance due to their suboptimal agent designs and mismatched evolution representations. This work identifies these mismatches and proposes cuPilot, a strategy-coordinated multi-agent framework that introduces strategy as an intermediate semantic representation for kernel evolution. Key contributions include a strategy-coordinated evolution algorithm, roofline-guided prompting, and strategy-level population initialization. Experimental results show that the generated kernels by cuPilot achieve an average speed up of 3.09 $\times$ over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Metaheuristic Optimization Algorithms Research