ARGUS: Agentic GPU Optimization Guided by Data-Flow Invariants
Haohui Mai, Xiaoyan Guo, Xiangyun Ding, Daifeng Li, Qiuchu Yu, Chenzhun Guo, Cong Wang, Jiacheng Zhao, Christos Kozyrakis, Binhang Yuan

TL;DR
Argus is a novel GPU kernel optimization framework that uses data-flow invariants and reinforcement learning to achieve near hand-optimized performance across various GPU tasks.
Contribution
It introduces a data-flow invariant guided approach with a Pythonic DSL and an RL planner, enabling targeted fixes and high performance in GPU kernel optimization.
Findings
Generated kernels reach 99-104% of hand-optimized throughput.
Achieves 2-1543x speedup over existing agentic systems.
Solves 90% of Level 2 KernelBench problems.
Abstract
LLM-based coding agents can generate functionally correct GPU kernels, yet their performance remains far below hand-optimized libraries on critical computations such as matrix multiplication, attention, and Mixture-of-Experts (MoE). Peak GPU performance requires coordinated reasoning over tightly coupled optimizations, including tiling, shared-memory staging, software pipelining, and instruction scheduling, while existing agents rely on sparse pass/fail feedback, leaving them unable to diagnose global constraint violations. We present Argus, an agentic framework that addresses this through data-flow invariants: compile-time specifications encoding how data must be choreographed throughout kernel execution. Argus introduces a tile-based, Pythonic DSL exposing hardware instructions and compiler policies while hiding low-level representations. The DSL provides tag functions to propagate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
