KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization
Qitong Sun, Jun Han, Tianlin Li, Zhe Tang, Sheng Chen, Fei Yang, Aishan Liu, Xianglong Liu, Yang Liu

TL;DR
KernelSkill is a multi-agent framework that enhances GPU kernel optimization by replacing implicit heuristics with expert knowledge-driven strategies, achieving significant speedups and high success rates.
Contribution
It introduces KernelSkill, a novel multi-agent system with a dual-memory architecture that improves GPU kernel optimization over existing LLM-based methods.
Findings
Achieves 100% success rate on KernelBench Levels 1-3.
Attains average speedups of 5.44x, 2.82x, and 1.92x over Torch Eager.
Outperforms prior baseline methods in GPU kernel optimization.
Abstract
Improving GPU kernel efficiency is crucial for advancing AI systems. Recent work has explored leveraging large language models (LLMs) for GPU kernel generation and optimization. However, existing LLM-based kernel optimization pipelines typically rely on opaque, implicitly learned heuristics within the LLMs to determine optimization strategies. This leads to inefficient trial-and-error and weakly interpretable optimizations. Our key insight is to replace implicit heuristics with expert optimization skills that are knowledge-driven and aware of task trajectories. Specifically, we present KernelSkill, a multi-agent framework with a dual-level memory architecture. KernelSkill operates by coordinating agents with long-term memory of reusable expert skills and short-term memory to prevent repetitive backtracking. On KernelBench Levels 1-3, KernelSkill achieves a 100% success rate and average…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques
