CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

Jiace Zhu; Wentao Chen; Qi Fan; Zhixing Ren; Junying Wu; Xing Zhe Chai; Chotiwit Rungrueangwutthinon; Yehan Ma; An Zou

arXiv:2603.02236·cs.LG·March 4, 2026

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

Jiace Zhu, Wentao Chen, Qi Fan, Zhixing Ren, Junying Wu, Xing Zhe Chai, Chotiwit Rungrueangwutthinon, Yehan Ma, An Zou

PDF

Open Access

TL;DR

CUDABench is a comprehensive benchmark for evaluating the ability of large language models to generate CUDA code from text, addressing correctness, functional accuracy, and performance across diverse application domains.

Contribution

This work introduces CUDABench, a novel benchmark with evaluation metrics and verification pipelines specifically designed for assessing text-to-CUDA generation by LLMs.

Findings

01

High compilation success does not guarantee functional correctness

02

LLMs lack domain-specific algorithmic knowledge

03

Generated code often underutilizes GPU hardware resources

Abstract

Recent studies have demonstrated the potential of Large Language Models (LLMs) in generating GPU Kernels. Current benchmarks focus on the translation of high-level languages into CUDA, overlooking the more general and challenging task of text-to-CUDA generation. Furthermore, given the hardware-specific and performance-critical features of GPU programming, accurately assessing the performance of LLM-generated GPU programs is nontrivial. In this work, we introduce CUDABench, a comprehensive benchmark designed to evaluate the text-to-CUDA capabilities of LLMs. First, we construct CUDABench-Set, which covers Breadth-Depth-Difficulty evaluation space in diverse application domains, including artificial intelligence, scientific computing, and data analytics, etc. Furthermore, we propose CUDABench-Score and Generative Verification Pipeline that assess (1) compilation correctness, (2)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Neural Network Applications