KernelBench: Can LLMs Write Efficient GPU Kernels?

Anne Ouyang; Simon Guo; Simran Arora; Alex L. Zhang; William Hu,; Christopher R\'e; Azalia Mirhoseini

arXiv:2502.10517·cs.LG·February 18, 2025

KernelBench: Can LLMs Write Efficient GPU Kernels?

Anne Ouyang, Simon Guo, Simran Arora, Alex L. Zhang, William Hu,, Christopher R\'e, Azalia Mirhoseini

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

KernelBench evaluates the ability of language models to generate efficient, correct GPU kernels for machine learning workloads, highlighting current limitations and potential improvements through iterative refinement.

Contribution

The paper introduces KernelBench, a comprehensive benchmark for assessing LMs' performance in generating GPU kernels, and proposes a new metric for functional correctness and speedup.

Findings

01

State-of-the-art models perform poorly, matching baseline in less than 20% of cases.

02

Iterative refinement with profiling feedback improves kernel quality.

03

KernelBench is a challenging benchmark with increasing difficulty at higher speedup thresholds.

Abstract

Efficient GPU kernels are crucial for building performant machine learning architectures, but writing them is a time-consuming challenge that requires significant expertise; therefore, we explore using language models (LMs) to automate kernel generation. We introduce KernelBench, an open-source framework for evaluating LMs' ability to write fast and correct kernels on a suite of 250 carefully selected PyTorch ML workloads. KernelBench represents a real-world engineering environment and making progress on the introduced benchmark directly translates to faster practical kernels. We introduce a new evaluation metric fast_p, which measures the percentage of generated kernels that are functionally correct and offer a speedup greater than an adjustable threshold p over baseline. Our experiments across various state-of-the-art models and test-time methods show that frontier reasoning models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

scalingintelligence/kernelbench
pytorch

Datasets

ScalingIntelligence/kernelbench-samples
dataset· 94 dl
94 dl

Videos

KernelBench: Can LLMs Write Efficient GPU Kernels?· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Image Processing and 3D Reconstruction