On the Implicit Bias of Gradient Descent for Temporal Extrapolation
Edo Cohen-Karlik, Avichai Ben David, Nadav Cohen, Amir Globerson

TL;DR
This paper investigates how gradient descent influences the ability of RNNs to extrapolate to longer sequences, revealing conditions under which they can or cannot generalize beyond training data.
Contribution
It demonstrates that gradient descent's implicit bias can lead to perfect extrapolation in RNNs under specific conditions, advancing understanding of temporal sequence modeling.
Findings
Infinite data can still lead to poor extrapolation.
Gradient descent can induce perfect extrapolation with proper initialization.
Implicit bias of gradient descent is crucial for temporal extrapolation.
Abstract
When using recurrent neural networks (RNNs) it is common practice to apply trained models to sequences longer than those seen in training. This "extrapolating" usage deviates from the traditional statistical learning setup where guarantees are provided under the assumption that train and test distributions are identical. Here we set out to understand when RNNs can extrapolate, focusing on a simple case where the data generating distribution is memoryless. We first show that even with infinite training data, there exist RNN models that interpolate perfectly (i.e., they fit the training data) yet extrapolate poorly to longer sequences. We then show that if gradient descent is used for training, learning will converge to perfect extrapolation under certain assumptions on initialization. Our results complement recent studies on the implicit bias of gradient descent, showing that it plays a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Gaussian Processes and Bayesian Inference
