Linear Attention for Efficient Bidirectional Sequence Modeling

Arshia Afzal; Elias Abad Rocamora; Leyla Naz Candogan; Pol Puigdemont; Francesco Tonin; Yongtao Wu; Mahsa Shoaran; Volkan Cevher

arXiv:2502.16249·cs.LG·October 1, 2025

Linear Attention for Efficient Bidirectional Sequence Modeling

Arshia Afzal, Elias Abad Rocamora, Leyla Naz Candogan, Pol Puigdemont, Francesco Tonin, Yongtao Wu, Mahsa Shoaran, Volkan Cevher

PDF

Open Access 1 Repo

TL;DR

LION is a novel framework that extends Linear Transformers to bidirectional sequence modeling, enabling faster training and inference while matching or surpassing softmax Transformer performance.

Contribution

The paper introduces LION, the first systematic approach to adapt Linear Transformers for bidirectional tasks, unifying multiple representations and demonstrating broad applicability.

Findings

01

LION achieves comparable or better performance than softmax Transformers on bidirectional tasks.

02

LION offers significantly faster training and more efficient inference than existing State Space Models.

03

The framework generalizes core Linear Transformer representations to the bidirectional setting.

Abstract

Linear Transformers and State Space Models have emerged as efficient alternatives to softmax Transformers for causal sequence modeling, enabling parallel training via matrix multiplication and efficient RNN-style inference. However, despite their success in causal tasks, no unified framework exists for applying Linear Transformers to bidirectional sequence modeling. We introduce LION, the first framework to systematically extend Linear Transformers to the bidirectional setting. LION generalizes three core representations commonly used in the causal case - full Linear Attention , bidirectional RNN, and chunkwise parallel form - to the bidirectional setting. These forms are theoretically equivalent and enable models to exploit the strengths of each during training and inference. We prove that a broad class of Linear Transformers can be extended using LION and validate our framework via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lions-epfl/lion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Neural Networks and Applications · Algorithms and Data Compression

MethodsSoftmax · Attention Is All You Need · Evolved Sign Momentum