AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units

Xinzi Cao; Jianyang Zhai; Pengfei Li; Zhiheng Hu; Cen Yan; Bingxu Mu; Guanghuan Fang; Bin She; Jiayu Li; Yihan Su; Dongyang Tao; Xiansong Huang; Fan Xu; Feidiao Yang; Yao Lu; Chang-Dong Wang; Yutong Lu; Weicheng Xue; Bin Zhou; Yonghong Tian

arXiv:2601.07160·cs.AI·April 20, 2026

AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units

Xinzi Cao, Jianyang Zhai, Pengfei Li, Zhiheng Hu, Cen Yan, Bingxu Mu, Guanghuan Fang, Bin She, Jiayu Li, Yihan Su, Dongyang Tao, Xiansong Huang, Fan Xu, Feidiao Yang, Yao Lu, Chang-Dong Wang, Yutong Lu, Weicheng Xue, Bin Zhou, Yonghong Tian

PDF

1 Repo 7 Models 3 Datasets

TL;DR

This paper introduces AscendKernelGen, a framework that leverages domain-adaptive LLMs and specialized datasets to automate high-performance kernel generation for NPUs, significantly improving success rates and correctness.

Contribution

It presents a novel domain-specific dataset, a fine-tuned LLM, and a comprehensive benchmark to enhance NPU kernel generation using LLMs.

Findings

01

Compilation success rate on complex kernels improved from 0% to 95.5%.

02

Functional correctness achieved 64.3%, overcoming baseline failures.

03

The approach demonstrates the importance of domain-specific reasoning in code generation.

Abstract

To meet the ever-increasing demand for computational efficiency, Neural Processing Units (NPUs) have become critical in modern AI infrastructure. However, unlocking their full potential requires developing high-performance compute kernels using vendor-specific Domain-Specific Languages (DSLs), a task that demands deep hardware expertise and is labor-intensive. While Large Language Models (LLMs) have shown promise in general code generation, they struggle with the strict constraints and scarcity of training data in the NPU domain. Our preliminary study reveals that state-of-the-art general-purpose LLMs fail to generate functional complex kernels for Ascend NPUs, yielding a near-zero success rate. To address these challenges, we propose AscendKernelGen, a generation-evaluation integrated framework for NPU kernel development. We introduce Ascend-CoT, a high-quality dataset incorporating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weich97/NPUKernelBench
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.