FuseMax: Leveraging Extended Einsums to Optimize Attention Accelerator Design
Nandeeka Nayak, Xinrui Wu, Toluwanimi O. Odemuyiwa, Michael Pellauer, Joel S. Emer, Christopher W. Fletcher

TL;DR
FuseMax introduces a novel approach using Einsum cascades to optimize transformer attention acceleration, achieving near-perfect compute utilization and significant speedups without off-chip memory bottlenecks.
Contribution
The paper formalizes attention algorithms with Einsum cascades and proposes FuseMax, a new architecture that improves speed and energy efficiency in transformer attention acceleration.
Findings
FuseMax achieves 6.7x speedup over FLAT in attention tasks.
FuseMax reduces energy consumption to 79% of prior methods.
End-to-end transformer inference is 5.3x faster with FuseMax.
Abstract
Attention for transformers is a critical workload that has recently received significant "attention" as a target for custom acceleration. Yet, while prior work succeeds in reducing attention's memory-bandwidth requirements, it creates load imbalance between operators that comprise the attention computation (resulting in severe compute under-utilization) and requires on-chip memory that scales with sequence length (which is expected to grow over time). This paper ameliorates these issues, enabling attention with nearly 100% compute utilization, no off-chip memory traffic bottlenecks, and on-chip buffer size requirements that are independent of sequence length. The main conceptual contribution is to use a recently proposed abstraction -- the cascade of Einsums -- to describe, formalize, and taxonomize the space of attention algorithms that appear in the literature. In particular, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Embedded Systems Design Techniques
