Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers
Awni Altabaa, Taylor Webb, Jonathan Cohen, John Lafferty

TL;DR
This paper introduces the Abstractor, a novel Transformer module with relational cross-attention that enhances explicit relational reasoning, improving generalization and sample efficiency across various relational tasks.
Contribution
The paper presents the Abstractor, a new Transformer extension with relational cross-attention, enabling explicit relational reasoning and better generalization from limited data.
Findings
Improved performance on simple discriminative relational tasks.
Dramatic sample efficiency gains on relational sequence-to-sequence tasks.
Consistent performance improvements on mathematical problem-solving tasks.
Abstract
An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the Abstractor. At the core of the Abstractor is a variant of attention called relational cross-attention. The approach is motivated by an architectural inductive bias for relational learning that disentangles relational information from object-level features. This enables explicit relational reasoning, supporting abstraction and generalization from limited data. The Abstractor is first evaluated on simple discriminative relational tasks and compared to existing relational architectures. Next, the Abstractor is evaluated on purely relational sequence-to-sequence tasks, where dramatic improvements are seen in sample efficiency compared to standard Transformers. Finally, Abstractors are evaluated on a collection of tasks based on mathematical problem solving, where consistent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Text Analysis Techniques · Cognitive Science and Mapping · Child and Animal Learning Development
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Adam · Residual Connection
