Exploring the Space of Key-Value-Query Models with Intention
Marta Garnelo, Wojciech Marian Czarnecki

TL;DR
This paper introduces a new neural network module based on least squares solutions that generalizes Attention, offering similar computational complexity and improved modeling capabilities, validated across various tasks.
Contribution
The paper presents a novel module derived from least squares that generalizes Linear Attention and can replace it without additional computational cost.
Findings
The least squares solution module can replace Attention efficiently.
The new module generalizes Linear Attention with proven theoretical benefits.
Experimental results show improved performance on diverse tasks.
Abstract
Attention-based models have been a key element of many recent breakthroughs in deep learning. Two key components of Attention are the structure of its input (which consists of keys, values and queries) and the computations by which these three are combined. In this paper we explore the space of models that share said input structure but are not restricted to the computations of Attention. We refer to this space as Keys-Values-Queries (KVQ) Space. Our goal is to determine whether there are any other stackable models in KVQ Space that Attention cannot efficiently approximate, which we can implement with our current deep learning toolbox and that solve problems that are interesting to the community. Maybe surprisingly, the solution to the standard least squares problem satisfies these properties. A neural network module that is able to compute this solution not only enriches the set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Bayesian Modeling and Causal Inference · Machine Learning and Algorithms
MethodsTest
