AscendOptimizer: Episodic Agent for Ascend NPU Operator Optimization
Jiehao Wu, Zixiao Huang, Wenhao Li, Chuyun Shen, Junjie Sheng, Xiangfeng Wang

TL;DR
AscendOptimizer is an episodic agent that autonomously improves Ascend NPU operator performance by learning from execution feedback, combining kernel rewrites and host-side scheduling optimization.
Contribution
It introduces a novel episodic learning approach that builds optimization knowledge directly from hardware execution, enhancing both kernel and host-side performance.
Findings
Achieves 1.21x speedup over open-source baseline on 101 operators.
53.47% of operators run faster than their references.
Outperforms Best-of-N sampling and OpenEvolve under the same evaluation budget.
Abstract
Optimizing AscendC (Ascend C) operators for Ascend NPUs is difficult for two reasons. First, unlike CUDA, the ecosystem offers few public kernels to learn from. Second, performance depends on a coupled two-part implementation: a host-side tiling program that controls data movement and a kernel program that schedules and pipelines computation. We present AscendOptimizer, an episodic agent that builds missing optimization knowledge from execution itself. For kernel optimization, AscendOptimizer rewinds strong implementations by removing optimizations in a controlled way, then keeps the changes whose removal measurably hurts performance as reusable experience for later rewriting. For host-side optimization, it runs profiling-in-the-loop evolutionary search to find valid, fast tiling and data-movement configurations directly from hardware feedback. This combination lets the agent improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · Neural Networks and Reservoir Computing
