Stable Recurrent Models

John Miller; Moritz Hardt

arXiv:1805.10369·cs.LG·March 5, 2019·73 cites

Stable Recurrent Models

John Miller, Moritz Hardt

PDF

Open Access

TL;DR

This paper investigates stable recurrent neural networks, proving their approximation by feed-forward models and demonstrating comparable performance on sequence tasks, thus highlighting the importance of stability in sequence learning.

Contribution

It provides a theoretical proof that stable RNNs are well approximated by feed-forward networks and empirically shows their competitive performance on benchmark tasks.

Findings

01

Stable RNNs are well approximated by feed-forward networks.

02

Stable RNNs perform comparably to unstable ones on sequence tasks.

03

Stability explains the success of replacing RNNs with feed-forward models.

Abstract

Stability is a fundamental property of dynamical systems, yet to this date it has had little bearing on the practice of recurrent neural networks. In this work, we conduct a thorough investigation of stable recurrent models. Theoretically, we prove stable recurrent neural networks are well approximated by feed-forward networks for the purpose of both inference and training by gradient descent. Empirically, we demonstrate stable recurrent models often perform as well as their unstable counterparts on benchmark sequence tasks. Taken together, these findings shed light on the effective power of recurrent networks and suggest much of sequence learning happens, or can be made to happen, in the stable regime. Moreover, our results help to explain why in many cases practitioners succeed in replacing recurrent models by feed-forward models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Domain Adaptation and Few-Shot Learning