FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning   IO-Awareness

Vincent Abbott; Gioele Zardini

arXiv:2412.03317·cs.LG·January 22, 2025·2 cites

FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness

Vincent Abbott, Gioele Zardini

PDF

Open Access

TL;DR

This paper introduces a diagrammatic, resource-aware approach to optimize deep learning algorithms on GPUs, enabling systematic derivation of high-level strategies and better understanding of techniques like FlashAttention.

Contribution

It extends Neural Circuit Diagrams to include resource usage and task distribution, facilitating hardware-aware optimization and analysis of deep learning algorithms.

Findings

01

Diagrams can derive streaming and tiling strategies.

02

High-level performance models incorporate quantization and GPU hierarchy effects.

03

Methodology enhances understanding of existing techniques like FlashAttention.

Abstract

Optimizing deep learning algorithms currently requires slow, manual derivation, potentially leaving much performance untapped. Methods like FlashAttention have achieved a x6 performance improvement over native PyTorch by avoiding unnecessary data transfers, but required three iterations over three years to be developed. Automated compiled methods have consistently lagged behind. This paper extends Neural Circuit Diagrams for deep learning models to consider resource usage and the distribution of tasks across a GPU hierarchy. We show how diagrams can use simple relabellings to derive high-level streaming and tiling optimization strategies along with performance models. We show how this high-level performance model allows the effects of quantization and multi-level GPU hierarchies to be readily considered. We develop a methodology for representing intermediate-level pseudocode with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems