Gradual Learning of Recurrent Neural Networks
Ziv Aharoni, Gal Rattner, Haim Permuter

TL;DR
This paper introduces a gradual training method for RNNs based on the Data Processing Inequality, improving training stability and performance in language modeling tasks.
Contribution
It proposes a novel layer-wise training approach and gradient clipping technique inspired by DPI, enhancing RNN training and performance.
Findings
Improved language modeling results with the proposed method.
Enhanced training stability and reduced overfitting.
Complementary to existing regularization and optimization techniques.
Abstract
Recurrent Neural Networks (RNNs) achieve state-of-the-art results in many sequence-to-sequence modeling tasks. However, RNNs are difficult to train and tend to suffer from overfitting. Motivated by the Data Processing Inequality (DPI), we formulate the multi-layered network as a Markov chain, introducing a training method that comprises training the network gradually and using layer-wise gradient clipping. We found that applying our methods, combined with previously introduced regularization and optimization methods, resulted in improvements in state-of-the-art architectures operating in language modeling tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
