Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis

Yujie Zheng; Zhuo Li; Shengtao Zhang; Hanjing Wang; Junjie Sheng; Jiaqian Wang; Junchi Yan; Weinan Zhang; Ying Wen; Bo Tang; Muning Wen

arXiv:2603.10846·cs.LG·March 12, 2026

Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis

Yujie Zheng, Zhuo Li, Shengtao Zhang, Hanjing Wang, Junjie Sheng, Jiaqian Wang, Junchi Yan, Weinan Zhang, Ying Wen, Bo Tang, Muning Wen

PDF

Open Access 1 Datasets

TL;DR

EvoKernel is a novel memory-based reinforcement learning framework that automates and improves the process of synthesizing hardware-specific kernels in data-scarce environments, significantly enhancing correctness and speed.

Contribution

The paper introduces EvoKernel, a value-driven memory reinforcement learning approach for cold-start kernel synthesis that generalizes across tasks and improves performance without fine-tuning.

Findings

01

Correctness improved from 11.0% to 83.0%.

02

Median speedup of 3.60x over initial drafts.

03

Effective cross-task memory sharing enables generalization.

Abstract

Deploying Large Language Models to data-scarce programming domains poses significant challenges, particularly for kernel synthesis on emerging Domain-Specific Architectures where a "Data Wall" limits available training data. While models excel on data-rich platforms like CUDA, they suffer catastrophic performance drops on data-scarce ecosystems such as NPU programming. To overcome this cold-start barrier without expensive fine-tuning, we introduce EvoKernel, a self-evolving agentic framework that automates the lifecycle of kernel synthesis from initial drafting to continual refining. EvoKernel addresses this by formulating the synthesis process as a memory-based reinforcement learning task. Through a novel value-driven retrieval mechanism, it learns stage-specific Q-values that prioritize experiences based on their contribution to the current objective, whether bootstrapping a feasible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

noahli/EvoKernel
dataset· 1.8k dl
1.8k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Machine Learning in Materials Science · Ferroelectric and Negative Capacitance Devices