The Quarks of Attention
Pierre Baldi, Roman Vershynin

TL;DR
This paper classifies fundamental attention mechanisms in deep learning, analyzes their computational properties, and highlights the central role of additive activation attention in understanding the capacity and efficiency of attention-based neural architectures.
Contribution
It provides a comprehensive classification of attention building blocks and studies their functional properties, revealing the importance of gating mechanisms and additive attention in neural network capacity.
Findings
Gating mechanisms are integral to current attention architectures.
Additive activation attention is crucial for lower bounds in capacity proofs.
Attention reduces circuit depth and enhances quadratic activation efficiency.
Abstract
Attention plays a fundamental role in both natural and artificial intelligence systems. In deep learning, attention-based neural architectures, such as transformer architectures, are widely used to tackle problems in natural language processing and beyond. Here we investigate the fundamental building blocks of attention and their computational properties. Within the standard model of deep learning, we classify all possible fundamental building blocks of attention in terms of their source, target, and computational mechanism. We identify and study three most important mechanisms: additive activation attention, multiplicative output attention (output gating), and multiplicative synaptic attention (synaptic gating). The gating mechanisms correspond to multiplicative extensions of the standard model and are used across all current attention-based deep learning architectures. We study their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · EEG and Brain-Computer Interfaces
