Universality of Linear Recurrences Followed by Non-linear Projections:   Finite-Width Guarantees and Benefits of Complex Eigenvalues

Antonio Orvieto; Soham De; Caglar Gulcehre; Razvan Pascanu; Samuel L.; Smith

arXiv:2307.11888·cs.LG·June 6, 2024·1 cites

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues

Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, Samuel L., Smith

PDF

Open Access

TL;DR

This paper demonstrates that combining linear RNNs with MLPs can universally approximate sequence-to-sequence maps, highlighting the benefits of complex eigenvalues for information storage and addressing vanishing gradients.

Contribution

It proves the universality of linear RNNs with non-linear projections and shows the advantages of complex eigenvalues in practical architectures like S4.

Findings

01

Complex eigenvalues near the unit disk improve information retention.

02

Real diagonal recurrences are sufficient for universality.

03

Using complex eigenvalues mitigates vanishing gradient issues.

Abstract

Deep neural networks based on linear RNNs interleaved with position-wise MLPs are gaining traction as competitive approaches for sequence modeling. Examples of such architectures include state-space models (SSMs) like S4, LRU, and Mamba: recently proposed models that achieve promising performance on text, genetics, and other data that require long-range reasoning. Despite experimental evidence highlighting these architectures' effectiveness and computational efficiency, their expressive power remains relatively unexplored, especially in connection to specific choices crucial in practice - e.g., carefully designed initialization distribution and potential use of complex numbers. In this paper, we show that combining MLPs with both real or complex linear diagonal recurrences leads to arbitrarily precise approximation of regular causal sequence-to-sequence maps. At the heart of our proof,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Algorithms and Data Compression · Error Correcting Code Techniques