Dissecting Linear Recurrent Models: How Different Gating Strategies Drive Selectivity and Generalization
Younes Bouhadjar, Maxime Fabre, Felix Schmidt, Emre Neftci

TL;DR
This paper introduces SelectivBench, a lightweight synthetic benchmark for evaluating linear recurrent models' ability to focus on relevant inputs, revealing how gating and forgetting mechanisms influence selectivity and generalization.
Contribution
It proposes a refined taxonomy of linear recurrent models and provides a new benchmark, SelectivBench, for systematic evaluation of their selectivity and generalization capabilities.
Findings
Gating and rapid forgetting mechanisms facilitate recall.
In-state channel mixing is unnecessary for selectivity but critical for generalization.
Softmax attention remains dominant due to its memory capacity scaling.
Abstract
Linear recurrent neural networks have emerged as efficient alternatives to the original Transformer's softmax attention mechanism, thanks to their highly parallelizable training and constant memory and computation requirements at inference. Iterative refinements of these models have introduced an increasing number of architectural mechanisms, leading to increased complexity and computational costs. Nevertheless, systematic direct comparisons among these models remain limited. Existing benchmark tasks are either too simplistic to reveal substantial differences or excessively resource-intensive for experimentation. In this work, we propose a refined taxonomy of linear recurrent models and introduce SelectivBench, a set of lightweight and customizable synthetic benchmark tasks for systematically evaluating sequence models. SelectivBench specifically evaluates selectivity in sequence models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
