DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products
Julien Siems, Timur Carstensen, Arber Zela, Frank Hutter, Massimiliano Pontil, Riccardo Grazzi

TL;DR
DeltaProduct enhances linear RNNs by using multiple Householder transformations per token, improving state-tracking and language modeling performance while balancing efficiency and expressivity.
Contribution
It introduces DeltaProduct, a novel architecture that employs multiple Householder transformations to improve state-tracking in linear RNNs.
Findings
Outperforms DeltaNet in state-tracking and language modeling.
Shows improved length extrapolation capabilities.
Provides theoretical analysis of state-tracking in finite precision.
Abstract
Linear Recurrent Neural Networks (linear RNNs) have emerged as competitive alternatives to Transformers for sequence modeling, offering efficient training and linear-time inference. However, existing architectures face a fundamental trade-off between expressivity and efficiency, dictated by the structure of their state-transition matrices. Diagonal matrices, used in models such as Mamba, GLA, or mLSTM, yield fast runtime but have limited expressivity. To address this, recent architectures such as DeltaNet and RWKV-7 adopted a diagonal plus rank--1 structure, which allows simultaneous token and channel mixing, improving associative recall and, as recently shown, state-tracking when allowing state-transition matrices to have negative eigenvalues. Building on the interpretation of DeltaNet's recurrence as performing one step of online gradient descent per token on an associative recall…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Approaches in Technology and Social Development
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces · Multiplicative LSTM
