Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
Awni Altabaa, John Lafferty

TL;DR
This paper introduces the Dual Attention Transformer (DAT), an extension of the Transformer architecture that explicitly separates and processes sensory and relational information, improving performance on relational reasoning tasks.
Contribution
The paper proposes a novel architectural extension, DAT, with separate attention mechanisms for sensory and relational information, addressing a key limitation of standard Transformers.
Findings
DAT outperforms standard Transformers on relational benchmarks
Explicit relational attention improves data and parameter efficiency
The approach benefits language and visual processing tasks
Abstract
Relational reasoning is a central component of generally intelligent systems, enabling robust and data-efficient inductive generalization. Recent empirical evidence shows that many existing neural architectures, including Transformers, struggle with tasks requiring relational reasoning. In this work, we distinguish between two types of information: sensory information about the properties of individual objects, and relational information about the relationships between objects. While neural attention provides a powerful mechanism for controlling the flow of sensory information between objects, the Transformer lacks an explicit computational mechanism for routing and processing relational information. To address this limitation, we propose an architectural extension of the Transformer framework that we call the Dual Attention Transformer (DAT), featuring two distinct attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗awni00/DAT-sa8-ra8-ns1024-sh8-nkvh4-343Mmodel· 95 dl95 dl
- 🤗awni00/DAT-sa8-ra8-nr64-ns1024-sh8-nkvh4-343Mmodel· 2 dl2 dl
- 🤗awni00/DAT-sa8-ra8-nr32-ns1024-sh8-nkvh4-343Mmodel· 1 dl1 dl
- 🤗awni00/DAT-sa16-ra16-nr64-ns2048-sh8-nkvh8-1.27Bmodel· 3 dl3 dl
- 🤗awni00/DAT-sa16-ra16-nr128-ns2048-sh16-nkvh8-1.27Bmodel· 5 dl· ♡ 15 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing
MethodsAttention Is All You Need · Sparse Evolutionary Training · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Dropout · Dense Connections · Absolute Position Encodings
