NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers

Sarunas Kalade; Graham Schelle

arXiv:2507.14403·cs.PL·July 22, 2025

NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers

Sarunas Kalade, Graham Schelle

PDF

TL;DR

NPUEval introduces a benchmark for NPU kernel optimization, evaluating LLM-generated code on real hardware, revealing significant challenges and progress in automating efficient kernel development.

Contribution

The paper presents NPUEval, a new benchmark dataset and evaluation framework for NPU kernel code generation using LLMs, addressing the lack of specialized benchmarks in this domain.

Findings

01

DeepSeek R1 achieves over 50% vectorization on some kernels

02

Average vectorization score across dataset is about 10%

03

Open source tools enable functional correctness and efficiency evaluation

Abstract

Neural processing units (NPUs) are gaining prominence in power-sensitive devices like client devices, with AI PCs being defined by their inclusion of these specialized processors. Running AI workloads efficiently on these devices requires libraries of optimized kernels. Creating efficient kernels demands expertise in domain-specific C++ with vector intrinsics and in-depth knowledge of the target architecture. Unlike GPU programming, which has had years to mature, NPU programming is new, with smaller and more fragmented developer communities across hardware platforms. This fragmentation poses a challenge when utilizing LLMs to assist in writing NPU kernels, as domain-specific optimized code examples are underrepresented in LLM pre-training data. In this paper we introduce NPUEval -- a benchmark for writing and evaluating NPU kernels, consisting of 102 common operators for machine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.