TL;DR
CASS introduces a comprehensive dataset and models for cross-architecture GPU code translation, enabling high-accuracy, performance-preserving transpilation between Nvidia and AMD GPU languages.
Contribution
The paper presents the first dataset and model suite for source- and assembly-level GPU translation, outperforming existing commercial tools and supporting open-source evaluation.
Findings
Achieves 88.2% accuracy on CUDA to HIP translation.
Generated code matches native performance in 85% of cases.
Outperforms GPT-5.1, Claude-4.5, and Hipify in accuracy.
Abstract
Cross-architecture GPU code transpilation is essential for unlocking low-level hardware portability, yet no scalable solution exists. We introduce CASS, the first dataset and model suite for source- and assembly-level GPU translation (CUDA <--> HIP, SASS <--> RDNA3). CASS contains 60k verified host-device code pairs, enabling learning-based translation across both ISA and runtime boundaries. We generate each sample using our automated pipeline that scrapes, translates, compiles, and aligns GPU programs across vendor stacks. Leveraging CASS, we train a suite of domain-specific translation models that achieve 88.2% accuracy on CUDA -> HIP and 69.1% on SASS -> RDNA3, outperforming commercial baselines including GPT-5.1, Claude-4.5, and Hipify by wide margins. Generated code matches native performance in 85% of cases, preserving both runtime and memory behavior. To support rigorous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
