MultiKernelBench: A Multi-Platform Benchmark for Kernel Generation

Zhongzhen Wen; Yinghui Zhang; Zhong Li; Zhongxin Liu; Linna Xie; Tian Zhang

arXiv:2507.17773·cs.DC·July 29, 2025

MultiKernelBench: A Multi-Platform Benchmark for Kernel Generation

Zhongzhen Wen, Yinghui Zhang, Zhong Li, Zhongxin Liu, Linna Xie, Tian Zhang

PDF

TL;DR

MultiKernelBench is a comprehensive multi-platform benchmark for evaluating large language models in deep learning kernel generation, covering diverse tasks and hardware, with a novel prompting method to improve quality.

Contribution

It introduces the first multi-platform benchmark for LLM-based DL kernel generation, with extensive task coverage, modular design, and a category-aware prompting strategy.

Findings

01

Significant variation in task difficulty across LLMs.

02

Poor generalization to less-exposed hardware platforms.

03

Targeted prompting improves kernel generation quality.

Abstract

The automatic generation of deep learning (DL) kernels using large language models (LLMs) has emerged as a promising approach to reduce the manual effort and hardware-specific expertise required for writing high-performance operator implementations. However, existing benchmarks for evaluating LLMs in this domain suffer from limited hardware support, coarse-grained kernel categorization, and imbalanced task coverage. To address these limitations, we introduce MultiKernelBench, the first comprehensive, multi-platform benchmark for LLM-based DL kernel generation. MultiKernelBench spans 285 tasks across 14 well-defined kernel categories and supports three major hardware platforms: Nvidia GPUs, Huawei NPUs, and Google TPUs. To enable future extensibility, we design a modular backend abstraction layer that decouples platform-specific logic from the core benchmarking infrastructure, allowing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.