Linear Attention for Efficient Bidirectional Sequence Modeling
Arshia Afzal, Elias Abad Rocamora, Leyla Naz Candogan, Pol Puigdemont, Francesco Tonin, Yongtao Wu, Mahsa Shoaran, Volkan Cevher

TL;DR
LION is a novel framework that extends Linear Transformers to bidirectional sequence modeling, enabling faster training and inference while matching or surpassing softmax Transformer performance.
Contribution
The paper introduces LION, the first systematic approach to adapt Linear Transformers for bidirectional tasks, unifying multiple representations and demonstrating broad applicability.
Findings
LION achieves comparable or better performance than softmax Transformers on bidirectional tasks.
LION offers significantly faster training and more efficient inference than existing State Space Models.
The framework generalizes core Linear Transformer representations to the bidirectional setting.
Abstract
Linear Transformers and State Space Models have emerged as efficient alternatives to softmax Transformers for causal sequence modeling, enabling parallel training via matrix multiplication and efficient RNN-style inference. However, despite their success in causal tasks, no unified framework exists for applying Linear Transformers to bidirectional sequence modeling. We introduce LION, the first framework to systematically extend Linear Transformers to the bidirectional setting. LION generalizes three core representations commonly used in the causal case - full Linear Attention , bidirectional RNN, and chunkwise parallel form - to the bidirectional setting. These forms are theoretically equivalent and enable models to exploit the strengths of each during training and inference. We prove that a broad class of Linear Transformers can be extended using LION and validate our framework via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Neural Networks and Applications · Algorithms and Data Compression
MethodsSoftmax · Attention Is All You Need · Evolved Sign Momentum
