Disentangling and Integrating Relational and Sensory Information in Transformer Architectures

Awni Altabaa; John Lafferty

arXiv:2405.16727·cs.LG·June 23, 2025

Disentangling and Integrating Relational and Sensory Information in Transformer Architectures

Awni Altabaa, John Lafferty

PDF

Open Access 2 Repos 5 Models

TL;DR

This paper introduces the Dual Attention Transformer (DAT), an extension of the Transformer architecture that explicitly separates and processes sensory and relational information, improving performance on relational reasoning tasks.

Contribution

The paper proposes a novel architectural extension, DAT, with separate attention mechanisms for sensory and relational information, addressing a key limitation of standard Transformers.

Findings

01

DAT outperforms standard Transformers on relational benchmarks

02

Explicit relational attention improves data and parameter efficiency

03

The approach benefits language and visual processing tasks

Abstract

Relational reasoning is a central component of generally intelligent systems, enabling robust and data-efficient inductive generalization. Recent empirical evidence shows that many existing neural architectures, including Transformers, struggle with tasks requiring relational reasoning. In this work, we distinguish between two types of information: sensory information about the properties of individual objects, and relational information about the relationships between objects. While neural attention provides a powerful mechanism for controlling the flow of sensory information between objects, the Transformer lacks an explicit computational mechanism for routing and processing relational information. To address this limitation, we propose an architectural extension of the Transformer framework that we call the Dual Attention Transformer (DAT), featuring two distinct attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing

MethodsAttention Is All You Need · Sparse Evolutionary Training · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Dropout · Dense Connections · Absolute Position Encodings