FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems
Shanli Xing, Yiyan Zhai, Alexander Jiang, Yixin Dong, Yong Wu, Zihao Ye, Charlie Ruan, Yingyi Huang, Yineng Zhang, Liangsheng Yin, Aksara Bayyapu, Luis Ceze, Tianqi Chen

TL;DR
FlashInfer-Bench creates a standardized, closed-loop framework that connects kernel generation, benchmarking, and deployment, enabling continuous improvement of AI-generated GPU kernels for large language model inference systems.
Contribution
It introduces a unified schema, curated dataset, benchmarking framework, and deployment mechanism to enhance AI-generated GPU kernels in LLM inference systems.
Findings
Evaluates performance and limitations of LLM agents in GPU kernel generation
Provides insights into trade-offs among GPU programming languages
Establishes a reproducible pathway for deploying AI-generated kernels
Abstract
Recent advances show that large language models (LLMs) can act as autonomous agents capable of generating GPU kernels, but integrating these AI-generated kernels into real-world inference systems remains challenging. FlashInfer-Bench addresses this gap by establishing a standardized, closed-loop framework that connects kernel generation, benchmarking, and deployment. At its core, FlashInfer Trace provides a unified schema describing kernel definitions, workloads, implementations, and evaluations, enabling consistent communication between agents and systems. Built on real serving traces, FlashInfer-Bench includes a curated dataset, a robust correctness- and performance-aware benchmarking framework, a public leaderboard to track LLM agents' GPU programming capabilities, and a dynamic substitution mechanism (apply()) that seamlessly injects the best-performing kernels into production LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Topic Modeling
