Learning and Generalization in RNNs

Abhishek Panigrahi; Navin Goyal

arXiv:2106.00047·cs.LG·June 2, 2021·1 cites

Learning and Generalization in RNNs

Abhishek Panigrahi, Navin Goyal

PDF

Open Access 1 Video

TL;DR

This paper advances the theoretical understanding of RNNs by proving they can learn general functions of sequences, introducing new methods to analyze hidden states, and demonstrating results on language recognition tasks.

Contribution

It provides the first theoretical proof that RNNs can learn general sequence functions, overcoming previous limitations to sum-of-functions cases.

Findings

01

RNNs can learn general functions of sequences.

02

New techniques for analyzing hidden states are introduced.

03

Results demonstrated on regular language recognition problems.

Abstract

Simple recurrent neural networks (RNNs) and their more advanced cousins LSTMs etc. have been very successful in sequence modeling. Their theoretical understanding, however, is lacking and has not kept pace with the progress for feedforward networks, where a reasonably complete understanding in the special case of highly overparametrized one-hidden-layer networks has emerged. In this paper, we make progress towards remedying this situation by proving that RNNs can learn functions of sequences. In contrast to the previous work that could only deal with functions of sequences that are sums of functions of individual tokens in the sequence, we allow general functions. Conceptually and technically, we introduce new ideas which enable us to extract information from the hidden state of the RNN in our proofs -- addressing a crucial weakness in previous work. We illustrate our results on some…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning and Generalization in RNNs· slideslive

Taxonomy

TopicsNeural Networks and Applications · Topic Modeling · Machine Learning and Algorithms