Gated Slot Attention for Efficient Linear-Time Sequence Modeling

Yu Zhang; Songlin Yang; Ruijie Zhu; Yue Zhang; Leyang Cui; Yiqiao; Wang; Bolun Wang; Freda Shi; Bailin Wang; Wei Bi; Peng Zhou; Guohong Fu

arXiv:2409.07146·cs.CL·November 1, 2024

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

Yu Zhang, Songlin Yang, Ruijie Zhu, Yue Zhang, Leyang Cui, Yiqiao, Wang, Bolun Wang, Freda Shi, Bailin Wang, Wei Bi, Peng Zhou, Guohong Fu

PDF

Open Access 3 Repos 10 Models

TL;DR

This paper introduces Gated Slot Attention (GSA), a novel method that improves memory capacity and efficiency in sequence modeling by combining gating mechanisms with attention, enabling better recall and faster training.

Contribution

GSA enhances attention models with gating and memory control, achieving efficient linear-time sequence modeling and improved recall in pretrained transformer finetuning.

Findings

01

GSA outperforms existing models in recall-intensive tasks.

02

GSA reduces training and inference resource requirements.

03

GSA is effective in finetuning pretrained transformers to RNNs.

Abstract

Linear attention Transformers and their gated variants, celebrated for enabling parallel training and efficient recurrent inference, still fall short in recall-intensive tasks compared to traditional Transformers and demand significant resources for training from scratch. This paper introduces Gated Slot Attention (GSA), which enhances Attention with Bounded-memory-Control (ABC) by incorporating a gating mechanism inspired by Gated Linear Attention (GLA). Essentially, GSA comprises a two-layer GLA linked via $softmax$ , utilizing context-aware memory reading and adaptive forgetting to improve memory capacity while maintaining compact recurrent state size. This design greatly enhances both training and inference efficiency through GLA's hardware-efficient training algorithm and reduced state size. Additionally, retaining the $softmax$ operation is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques

MethodsAttention Is All You Need · Softmax