AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units
Xinzi Cao, Jianyang Zhai, Pengfei Li, Zhiheng Hu, Cen Yan, Bingxu Mu, Guanghuan Fang, Bin She, Jiayu Li, Yihan Su, Dongyang Tao, Xiansong Huang, Fan Xu, Feidiao Yang, Yao Lu, Chang-Dong Wang, Yutong Lu, Weicheng Xue, Bin Zhou, Yonghong Tian

TL;DR
This paper introduces AscendKernelGen, a framework that leverages domain-adaptive LLMs and specialized datasets to automate high-performance kernel generation for NPUs, significantly improving success rates and correctness.
Contribution
It presents a novel domain-specific dataset, a fine-tuned LLM, and a comprehensive benchmark to enhance NPU kernel generation using LLMs.
Findings
Compilation success rate on complex kernels improved from 0% to 95.5%.
Functional correctness achieved 64.3%, overcoming baseline failures.
The approach demonstrates the importance of domain-specific reasoning in code generation.
Abstract
To meet the ever-increasing demand for computational efficiency, Neural Processing Units (NPUs) have become critical in modern AI infrastructure. However, unlocking their full potential requires developing high-performance compute kernels using vendor-specific Domain-Specific Languages (DSLs), a task that demands deep hardware expertise and is labor-intensive. While Large Language Models (LLMs) have shown promise in general code generation, they struggle with the strict constraints and scarcity of training data in the NPU domain. Our preliminary study reveals that state-of-the-art general-purpose LLMs fail to generate functional complex kernels for Ascend NPUs, yielding a near-zero success rate. To address these challenges, we propose AscendKernelGen, a generation-evaluation integrated framework for NPU kernel development. We introduce Ascend-CoT, a high-quality dataset incorporating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗AscendKernelGen/KernelGen-LM-1.7Bmodel· 750 dl· ♡ 1750 dl♡ 1
- 🤗AscendKernelGen/KernelGen-LM-4Bmodel· 718 dl· ♡ 1718 dl♡ 1
- 🤗AscendKernelGen/KernelGen-LM-8Bmodel· 740 dl· ♡ 1740 dl♡ 1
- 🤗AscendKernelGen/KernelGen-LM-14Bmodel· 727 dl· ♡ 1727 dl♡ 1
- 🤗AscendKernelGen/KernelGen-LM-32Bmodel· 764 dl· ♡ 2764 dl♡ 2
- 🤗AscendKernelGen/KernelGen-LM-32B-RLmodel· 749 dl· ♡ 1749 dl♡ 1
- 🤗AscendKernelGen/KernelGen-LM-MoE-30Bmodel· 1.4k dl· ♡ 11.4k dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
