Associative Transformer

Yuwei Sun; Hideya Ochiai; Zhirong Wu; Stephen Lin; Ryota Kanai

arXiv:2309.12862·cs.LG·March 12, 2025

Associative Transformer

Yuwei Sun, Hideya Ochiai, Zhirong Wu, Stephen Lin, Ryota Kanai

PDF

Open Access 1 Repo

TL;DR

The Associative Transformer introduces a memory-augmented sparse attention mechanism that enhances relational reasoning and parameter efficiency in vision tasks, outperforming existing sparse Transformer models.

Contribution

It proposes a novel associative memory-based attention method with explicit learnable priors, improving efficiency and performance over prior sparse Transformers.

Findings

01

AiT requires fewer parameters and layers than comparable models.

02

AiT outperforms state-of-the-art sparse Transformers on relational reasoning tasks.

03

AiT demonstrates superior performance in vision classification and reasoning benchmarks.

Abstract

Emerging from the pairwise attention in conventional Transformers, there is a growing interest in sparse attention mechanisms that align more closely with localized, contextual learning in the biological brain. Existing studies such as the Coordination method employ iterative cross-attention mechanisms with a bottleneck to enable the sparse association of inputs. However, these methods are parameter inefficient and fail in more complex relational reasoning tasks. To this end, we propose Associative Transformer (AiT) to enhance the association among sparsely attended input tokens, improving parameter efficiency and performance in various vision tasks such as classification and relational reasoning. AiT leverages a learnable explicit memory comprising specialized priors that guide bottleneck attentions to facilitate the extraction of diverse localized tokens. Moreover, AiT employs an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuweisunn/associative-transformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Residual Connection · Layer Normalization · Label Smoothing · Set Transformer · Byte Pair Encoding · Dropout