Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues
Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, Samuel L., Smith

TL;DR
This paper demonstrates that combining linear RNNs with MLPs can universally approximate sequence-to-sequence maps, highlighting the benefits of complex eigenvalues for information storage and addressing vanishing gradients.
Contribution
It proves the universality of linear RNNs with non-linear projections and shows the advantages of complex eigenvalues in practical architectures like S4.
Findings
Complex eigenvalues near the unit disk improve information retention.
Real diagonal recurrences are sufficient for universality.
Using complex eigenvalues mitigates vanishing gradient issues.
Abstract
Deep neural networks based on linear RNNs interleaved with position-wise MLPs are gaining traction as competitive approaches for sequence modeling. Examples of such architectures include state-space models (SSMs) like S4, LRU, and Mamba: recently proposed models that achieve promising performance on text, genetics, and other data that require long-range reasoning. Despite experimental evidence highlighting these architectures' effectiveness and computational efficiency, their expressive power remains relatively unexplored, especially in connection to specific choices crucial in practice - e.g., carefully designed initialization distribution and potential use of complex numbers. In this paper, we show that combining MLPs with both real or complex linear diagonal recurrences leads to arbitrarily precise approximation of regular causal sequence-to-sequence maps. At the heart of our proof,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Algorithms and Data Compression · Error Correcting Code Techniques
