SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Edward Lin; Sahil Modi; Siva Kumar Sastry Hari; Qijing Huang; Zhifan Ye; Nestor Qin; Fengzhe Zhou; Yuan Zhang; Jingquan Wang; Sana Damani; Dheeraj Peri; Ouye Xie; Aditya Kane; Moshe Maor; Michael Behar; Triston Cao; Rishabh Mehta; Vartika Singh; Vikram Sharma Mailthody; Terry Chen; Zihao Ye; Hanfeng Chen; Tianqi Chen; Vinod Grover; Wei Chen; Wei Liu; Eric Chung; Luis Ceze; Roger Bringmann; Cyril Zeller; Michael Lightstone; Christos Kozyrakis; Humphrey Shi

arXiv:2603.19173·cs.LG·March 20, 2026

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Edward Lin, Sahil Modi, Siva Kumar Sastry Hari, Qijing Huang, Zhifan Ye, Nestor Qin, Fengzhe Zhou, Yuan Zhang, Jingquan Wang, Sana Damani, Dheeraj Peri, Ouye Xie, Aditya Kane, Moshe Maor, Michael Behar, Triston Cao, Rishabh Mehta, Vartika Singh, Vikram Sharma Mailthody

PDF

Open Access 1 Datasets

TL;DR

SOL-ExecBench introduces a hardware-grounded benchmarking framework for GPU kernels, measuring their proximity to hardware efficiency limits across diverse AI workloads, thus enabling more meaningful optimization assessments.

Contribution

The paper presents SOL-ExecBench, a novel benchmark that evaluates GPU kernels against analytically derived hardware efficiency bounds, shifting focus from software speedups to hardware-aware performance.

Findings

01

Benchmark covers 235 kernels from diverse AI models.

02

Provides a fixed hardware-based performance target (SOL bounds).

03

Includes a sandboxed environment for robust evaluation.

Abstract

As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting NVIDIA Blackwell GPUs. The benchmark covers forward and backward workloads across BF16, FP8, and NVFP4, including kernels whose best performance is expected to rely on Blackwell-specific capabilities. Unlike prior benchmarks that evaluate kernels primarily relative to software implementations, SOL-ExecBench measures performance against analytically derived Speed-of-Light (SOL) bounds computed by SOLAR, our pipeline for deriving hardware-grounded…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

nvidia/SOL-ExecBench
dataset· 428 dl
428 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Embedded Systems Design Techniques