On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis
Zhong Li, Jiequn Han, Weinan E, Qianxiao Li

TL;DR
This paper analyzes how memory affects the approximation and training of linear RNNs, revealing that longer memory leads to increased complexity and slower learning, a phenomenon termed the 'curse of memory.'
Contribution
It provides a universal approximation theorem for linear functionals and characterizes how memory length impacts approximation rate and optimization dynamics in linear RNNs.
Findings
Long-term memory increases the number of neurons needed for approximation.
Training linear RNNs slows down exponentially with memory length.
Memory effects significantly influence both approximation quality and learning speed.
Abstract
We study the approximation properties and optimization dynamics of recurrent neural networks (RNNs) when applied to learn input-output relationships in temporal data. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understood as a sequence of linear functionals. We prove a universal approximation theorem of such linear functionals, and characterize the approximation rate and its relation with memory. Moreover, we perform a fine-grained dynamical analysis of training linear RNNs, which further reveal the intricate interactions between memory and learning. A unifying theme uncovered is the non-trivial effect of memory, a notion that can be made precise in our framework, on approximation and optimization: when there is long term memory in the target, it takes a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Neural dynamics and brain function
