QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation

Xinguo Zhu; Shaohui Peng; Jiaming Guo; Yunji Chen; Qi Guo; Yuanbo Wen; Hang Qin; Ruizhi Chen; Qirui Zhou; Ke Gao; Yanjun Wu; Chen Zhao; Ling Li

arXiv:2511.20100·cs.DC·November 26, 2025

QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation

Xinguo Zhu, Shaohui Peng, Jiaming Guo, Yunji Chen, Qi Guo, Yuanbo Wen, Hang Qin, Ruizhi Chen, Qirui Zhou, Ke Gao, Yanjun Wu, Chen Zhao, Ling Li

PDF

Open Access

TL;DR

This paper introduces MTMC, a hierarchical framework combining macro-level strategy and micro-level implementation, enabling LLMs to generate high-performance GPU kernels efficiently and accurately, surpassing existing methods in benchmarks.

Contribution

It proposes a novel Macro Thinking Micro Coding framework that decouples optimization strategy from implementation, improving GPU kernel generation with reinforcement learning and incremental coding.

Findings

01

Achieves near 100% accuracy on KernelBench levels 1-2.

02

Attains up to 7.3x speedup over LLMs.

03

Reaches 59.64% accuracy and 34x speedup on TritonBench.

Abstract

Developing high-performance GPU kernels is critical for AI and scientific computing, but remains challenging due to its reliance on expert crafting and poor portability. While LLMs offer promise for automation, both general-purpose and finetuned LLMs suffer from two fundamental and conflicting limitations: correctness and efficiency. The key reason is that existing LLM-based approaches directly generate the entire optimized low-level programs, requiring exploration of an extremely vast space encompassing both optimization policies and implementation codes. To address the challenge of exploring an intractable space, we propose Macro Thinking Micro Coding (MTMC), a hierarchical framework inspired by the staged optimization strategy of human experts. It decouples optimization strategy from implementation details, ensuring efficiency through high-level strategy and correctness through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices