ARGUS: Agentic GPU Optimization Guided by Data-Flow Invariants

Haohui Mai; Xiaoyan Guo; Xiangyun Ding; Daifeng Li; Qiuchu Yu; Chenzhun Guo; Cong Wang; Jiacheng Zhao; Christos Kozyrakis; Binhang Yuan

arXiv:2604.18616·cs.DC·April 22, 2026

ARGUS: Agentic GPU Optimization Guided by Data-Flow Invariants

Haohui Mai, Xiaoyan Guo, Xiangyun Ding, Daifeng Li, Qiuchu Yu, Chenzhun Guo, Cong Wang, Jiacheng Zhao, Christos Kozyrakis, Binhang Yuan

PDF

TL;DR

Argus is a novel GPU kernel optimization framework that uses data-flow invariants and reinforcement learning to achieve near hand-optimized performance across various GPU tasks.

Contribution

It introduces a data-flow invariant guided approach with a Pythonic DSL and an RL planner, enabling targeted fixes and high performance in GPU kernel optimization.

Findings

01

Generated kernels reach 99-104% of hand-optimized throughput.

02

Achieves 2-1543x speedup over existing agentic systems.

03

Solves 90% of Level 2 KernelBench problems.

Abstract

LLM-based coding agents can generate functionally correct GPU kernels, yet their performance remains far below hand-optimized libraries on critical computations such as matrix multiplication, attention, and Mixture-of-Experts (MoE). Peak GPU performance requires coordinated reasoning over tightly coupled optimizations, including tiling, shared-memory staging, software pipelining, and instruction scheduling, while existing agents rely on sparse pass/fail feedback, leaving them unable to diagnose global constraint violations. We present Argus, an agentic framework that addresses this through data-flow invariants: compile-time specifications encoding how data must be choreographed throughout kernel execution. Argus introduces a tile-based, Pythonic DSL exposing hardware instructions and compiler policies while hiding low-level representations. The DSL provides tag functions to propagate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.