On the Provable Generalization of Recurrent Neural Networks
Lifu Wang, Bo Shen, Bo Hu, Xing Cao

TL;DR
This paper provides theoretical guarantees for the generalization of over-parameterized RNNs, demonstrating learnability of complex functions without normalization constraints and analyzing the impact of input sequence structure.
Contribution
It introduces new generalization bounds for RNNs trained with random initialization, extending learnability results to non-additive and more complex function classes.
Findings
Learnability of certain functions without normalized input conditions
Almost-polynomial scaling of iterations and samples with input length
Extension to non-additive functions of input sequences
Abstract
Recurrent Neural Network (RNN) is a fundamental structure in deep learning. Recently, some works study the training process of over-parameterized neural networks, and show that over-parameterized networks can learn functions in some notable concept classes with a provable generalization error bound. In this paper, we analyze the training and generalization for RNNs with random initialization, and provide the following improvements over recent works: 1) For a RNN with input sequence , previous works study to learn functions that are summation of and require normalized conditions that with some very small depending on the complexity of . In this paper, using detailed analysis about the neural tangent kernel matrix, we prove a generalization error bound to learn such functions without normalized conditions and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Machine Learning and Algorithms
