CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels

Xing Ma; Yangjie Zhou; Wu Sun; Zihan Liu; Jingwen Leng; Yun Lin; Shixuan Sun; Minyi Guo; Jin Song Dong

arXiv:2605.05023·cs.LG·May 7, 2026

CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels

Xing Ma, Yangjie Zhou, Wu Sun, Zihan Liu, Jingwen Leng, Yun Lin, Shixuan Sun, Minyi Guo, Jin Song Dong

PDF

TL;DR

CuBridge is an LLM-based framework that transforms expert CUDA attention kernels into optimized, correct GPU code from PyTorch specifications, outperforming existing methods across variants and platforms.

Contribution

It introduces a structured lift-transfer-lower workflow that adapts expert kernels into high-performance, correct CUDA implementations from high-level specifications.

Findings

01

CuBridge consistently produces correct kernels across diverse attention variants.

02

It substantially outperforms general frameworks, compiler-based approaches, and prior LLM methods.

03

The framework works effectively across multiple GPU platforms.

Abstract

Efficient CUDA implementations of attention mechanisms are critical to modern deep learning systems, yet supporting diverse and evolving attention variants remains challenging. Existing frameworks and compilers trade performance for flexibility, while expert-written kernels achieve high efficiency but are difficult to adapt. Recent work explores large language models (LLMs) for GPU kernel generation, but prior studies report unstable correctness and significant performance gaps for complex operators such as attention. We present CuBridge, an LLM-based framework that adapts expert-written attention kernels through a structured lift-transfer-lower workflow. CuBridge starts from expert-written CUDA attention kernels and lifts them into an executable intermediate representation that makes execution orchestration explicit while abstracting low-level CUDA syntax. Given a user-provided…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.