ReGLA: Refining Gated Linear Attention

Peng Lu; Ivan Kobyzev; Mehdi Rezagholizadeh; Boxing Chen; Philippe Langlais

arXiv:2502.01578·cs.CL·August 12, 2025

ReGLA: Refining Gated Linear Attention

Peng Lu, Ivan Kobyzev, Mehdi Rezagholizadeh, Boxing Chen, Philippe Langlais

PDF

Open Access 1 Video

TL;DR

ReGLA introduces a refined gated linear attention mechanism with improved feature mapping, normalization, and gating, leading to superior performance in large language model tasks while reducing computational complexity.

Contribution

This work presents a comprehensive refinement of Gated Linear Attention by enhancing feature maps, normalization, and gating, resulting in better performance and training stability.

Findings

01

Outperforms previous Gated Linear Attention methods

02

Effective in training from scratch and continual pre-training

03

Reduces computational complexity of attention mechanisms

Abstract

Recent advancements in Large Language Models (LLMs) have set themselves apart with their exceptional performance in complex language modelling tasks. However, these models are also known for their significant computational and storage requirements, primarily due to the quadratic computation complexity of softmax attention. To mitigate this issue, linear attention has been designed to reduce the quadratic space-time complexity that is inherent in standard transformers. In this work, we embarked on a comprehensive exploration of three key components that substantially impact the performance of the Gated Linear Attention module: feature maps, normalization, and the gating mechanism. We developed a feature mapping function to address some crucial issues that previous suggestions overlooked. Then we offered further rationale for the integration of normalization layers to stabilize the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ReGLA: Refining Gated Linear Attention· underline

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques