KernelBench: Can LLMs Write Efficient GPU Kernels?
Anne Ouyang, Simon Guo, Simran Arora, Alex L. Zhang, William Hu,, Christopher R\'e, Azalia Mirhoseini

TL;DR
KernelBench evaluates the ability of language models to generate efficient, correct GPU kernels for machine learning workloads, highlighting current limitations and potential improvements through iterative refinement.
Contribution
The paper introduces KernelBench, a comprehensive benchmark for assessing LMs' performance in generating GPU kernels, and proposes a new metric for functional correctness and speedup.
Findings
State-of-the-art models perform poorly, matching baseline in less than 20% of cases.
Iterative refinement with profiling feedback improves kernel quality.
KernelBench is a challenging benchmark with increasing difficulty at higher speedup thresholds.
Abstract
Efficient GPU kernels are crucial for building performant machine learning architectures, but writing them is a time-consuming challenge that requires significant expertise; therefore, we explore using language models (LMs) to automate kernel generation. We introduce KernelBench, an open-source framework for evaluating LMs' ability to write fast and correct kernels on a suite of 250 carefully selected PyTorch ML workloads. KernelBench represents a real-world engineering environment and making progress on the introduced benchmark directly translates to faster practical kernels. We introduce a new evaluation metric fast_p, which measures the percentage of generated kernels that are functionally correct and offer a speedup greater than an adjustable threshold p over baseline. Our experiments across various state-of-the-art models and test-time methods show that frontier reasoning models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Image Processing and 3D Reconstruction
