TL;DR
This paper introduces PoST, a spectral reparameterization framework that enhances linear recurrence models' long-range memory and performance, with theoretical guarantees and broad applicability.
Contribution
The paper proposes PoST, a novel spectral reparameterization method that optimally improves long-range dependency modeling in linear recurrent architectures.
Findings
PoST achieves minimax optimal decay rates.
Pre-training with PoST improves language modeling and long-context retrieval.
PoST integrates seamlessly into various linear recurrence models.
Abstract
Linear recurrent models offer linear-time sequence processing but often suffer from suboptimal long-range memory. We trace this to the decay spectrum: for channels, random initialization collapses the minimum spectral gap to , yielding sub-exponential error ; linear spacing avoids collapse but degrades to , practically algebraic over long contexts. We introduce Position-Adaptive Spectral Tapering (PoST), an architecture-agnostic framework combining two mechanisms: (1) Spectral Reparameterization, which structurally enforces geometrically spaced log-decay rates, proven minimax optimal at rate ; and (2) Position-Adaptive Scaling, the provably unique mechanism that eliminates the scale mismatch of static spectra (where only of channels are effective at position ) by stretching the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
