ThunderKittens: Simple, Fast, and Adorable AI Kernels

Benjamin F. Spector; Simran Arora; Aaryan Singhal; Daniel Y. Fu,; Christopher R\'e

arXiv:2410.20399·cs.LG·October 29, 2024

ThunderKittens: Simple, Fast, and Adorable AI Kernels

Benjamin F. Spector, Simran Arora, Aaryan Singhal, Daniel Y. Fu,, Christopher R\'e

PDF

Open Access 1 Repo

TL;DR

ThunderKittens introduces a simplified framework with key abstractions for writing high-performance AI GPU kernels, achieving performance comparable or superior to existing solutions across various AI operations.

Contribution

The paper presents ThunderKittens, a framework that uses a small set of abstractions to simplify and accelerate the development of performant AI GPU kernels.

Findings

01

Matches or outperforms prior kernels for AI operations

02

Achieves 10-40% better performance on attention backwards

03

Outperforms baselines by up to 14x on linear attention

Abstract

The challenge of mapping AI architectures to GPU hardware is creating a critical bottleneck in AI progress. Despite substantial efforts, hand-written custom kernels fail to meet their theoretical performance thresholds, even on well-established operations like linear attention. The diverse hardware capabilities of GPUs might suggest that we need a wide variety of techniques to achieve high performance. However, our work explores whether a small number of key abstractions can drastically simplify the process. We present ThunderKittens (TK), a framework for writing performant AI kernels while remaining easy to use and maintain. Our abstractions map to the three levels of the GPU hierarchy: (1) at the warp-level, we provide 16x16 matrix tiles as basic data structures and PyTorch-like parallel compute operations over tiles, (2) at the thread-block level, we provide a template for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HazyResearch/ThunderKittens
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Physics and Python Applications

MethodsSoftmax · Attention Is All You Need