AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization

Genghan Zhang; Shaowei Zhu; Anjiang Wei; Zhenyu Song; Allen Nie; Zhen Jia; Nandita Vijaykumar; Yida Wang; Kunle Olukotun

arXiv:2511.15915·cs.LG·April 17, 2026

AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization

Genghan Zhang, Shaowei Zhu, Anjiang Wei, Zhenyu Song, Allen Nie, Zhen Jia, Nandita Vijaykumar, Yida Wang, Kunle Olukotun

PDF

1 Repo 3 Models 2 Datasets

TL;DR

AccelOpt is a self-improving LLM-based system that autonomously optimizes AI accelerator kernels, demonstrating significant throughput improvements and cost-effectiveness on AWS Trainium hardware.

Contribution

It introduces a novel self-improving LLM agentic system for kernel optimization that does not require expert knowledge, with a new benchmark suite and open-source code.

Findings

01

Improves average throughput from 49% to 61% on Trainium 1.

02

Achieves kernel optimization comparable to Claude Sonnet 4 at 26x lower cost.

03

Demonstrates continuous improvement over time with open-source models.

Abstract

We present AccelOpt, a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI acclerators, eliminating the need for expert-provided hardware-specific optimization knowledge. AccelOpt explores the kernel optimization space through iterative generation, informed by an optimization memory that curates experiences and insights from previously encountered slow-fast kernel pairs. We build NKIBench, a new benchmark suite of AWS Trainium accelerator kernels with varying complexity extracted from real-world LLM workloads to evaluate the effectiveness of AccelOpt. Our evaluation confirms that AccelOpt's capability improves over time, boosting the average percentage of peak throughput from $49%$ to $61%$ on Trainium 1 and from $45%$ to $59%$ on Trainium 2 for NKIBench kernels. Moreover, AccelOpt is highly cost-effective: using open-source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhang677/AccelOpt
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.