On the Implicit Bias of Gradient Descent for Temporal Extrapolation

Edo Cohen-Karlik; Avichai Ben David; Nadav Cohen; Amir Globerson

arXiv:2202.04302·cs.LG·March 25, 2022

On the Implicit Bias of Gradient Descent for Temporal Extrapolation

Edo Cohen-Karlik, Avichai Ben David, Nadav Cohen, Amir Globerson

PDF

Open Access

TL;DR

This paper investigates how gradient descent influences the ability of RNNs to extrapolate to longer sequences, revealing conditions under which they can or cannot generalize beyond training data.

Contribution

It demonstrates that gradient descent's implicit bias can lead to perfect extrapolation in RNNs under specific conditions, advancing understanding of temporal sequence modeling.

Findings

01

Infinite data can still lead to poor extrapolation.

02

Gradient descent can induce perfect extrapolation with proper initialization.

03

Implicit bias of gradient descent is crucial for temporal extrapolation.

Abstract

When using recurrent neural networks (RNNs) it is common practice to apply trained models to sequences longer than those seen in training. This "extrapolating" usage deviates from the traditional statistical learning setup where guarantees are provided under the assumption that train and test distributions are identical. Here we set out to understand when RNNs can extrapolate, focusing on a simple case where the data generating distribution is memoryless. We first show that even with infinite training data, there exist RNN models that interpolate perfectly (i.e., they fit the training data) yet extrapolate poorly to longer sequences. We then show that if gradient descent is used for training, learning will converge to perfect extrapolation under certain assumptions on initialization. Our results complement recent studies on the implicit bias of gradient descent, showing that it plays a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Gaussian Processes and Bayesian Inference