Nautilus: An Auto-Scheduling Tensor Compiler for Efficient Tiled GPU Kernels

Yifan Zhao; Yuchen Yang; Matei Budiu; Sasa Misailovic

arXiv:2604.14825·cs.PL·April 17, 2026

Nautilus: An Auto-Scheduling Tensor Compiler for Efficient Tiled GPU Kernels

Yifan Zhao, Yuchen Yang, Matei Budiu, Sasa Misailovic

PDF

TL;DR

Nautilus is an automated tensor compiler that transforms high-level algebraic tensor specifications into highly optimized tiled GPU kernels, significantly improving performance over existing compilers.

Contribution

It introduces a novel auto-scheduler that jointly applies high-level optimizations and discovers kernels like FlashAttention-3 from math descriptions.

Findings

01

Achieves up to 23% higher throughput on NVIDIA GH200 GPUs.

02

Achieves up to 42% higher throughput on RTX 5090 GPUs.

03

Matches or exceeds manually optimized cuDNN kernels in many cases.

Abstract

We present Nautilus, a novel tensor compiler that moves toward fully automated math-to-kernel optimization. Nautilus compiles a high-level algebraic specification of tensor operators into efficient tiled GPU kernels. Nautilus's successive lowering design allows high-level optimizations, expression rewrites, and tile optimizations to be jointly applied in a single end-to-end system. Nautilus presents a novel auto-scheduler that discovers sequences of high-level optimizations, while preserving the regular program structure needed by tile optimizers. Nautilus's auto-scheduler captures complex interactions and trade-offs in the high-level optimizations, including aggressive global transformations like advanced reduction fusion. Nautilus is the first end-to-end tensor compiler capable of starting from a math-like description of attention and automatically discovering FlashAttention-3-like…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.