AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search

Jaber Jaber; Osama Jaber

arXiv:2603.21331·cs.LG·March 24, 2026

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search

Jaber Jaber, Osama Jaber

PDF

Open Access

TL;DR

AutoKernel is an autonomous framework that optimizes GPU kernels for machine learning models through iterative, agent-driven search, significantly improving performance without human intervention.

Contribution

It introduces a fully automated GPU kernel optimization system that profiles, tests, and refines kernels for PyTorch models using an iterative agent loop.

Findings

01

AutoKernel's Triton kernels outperform PyTorch eager and torch.compile.

02

Achieved up to 5.29x speedup on RMSNorm.

03

Won first place on vectorsum_v2 B200 leaderboard.

Abstract

Writing high-performance GPU kernels is among the most labor-intensive tasks in machine learning systems engineering. We present AutoKernel, an open-source framework that applies an autonomous agent loop to GPU kernel optimization for arbitrary PyTorch models. Given a model, AutoKernel profiles it to identify computational bottlenecks, ranks them by Amdahl's law impact, and iteratively refines Triton or CUDA C++ kernel implementations through hundreds of experiments without human intervention. A five-stage correctness harness covering smoke tests, shape sweeps, numerical stability, determinism verification, and edge-case coverage ensures every candidate kernel is validated before any speedup is recorded. The system comprises over 9,000 lines of Python, 18 starter kernel implementations across two backends, a six-tier optimization playbook, and integration with the KernelBench benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Cloud Computing and Resource Management