Gradual Learning of Recurrent Neural Networks

Ziv Aharoni; Gal Rattner; Haim Permuter

arXiv:1708.08863·stat.ML·May 24, 2018·1 cites

Gradual Learning of Recurrent Neural Networks

Ziv Aharoni, Gal Rattner, Haim Permuter

PDF

Open Access 1 Repo

TL;DR

This paper introduces a gradual training method for RNNs based on the Data Processing Inequality, improving training stability and performance in language modeling tasks.

Contribution

It proposes a novel layer-wise training approach and gradient clipping technique inspired by DPI, enhancing RNN training and performance.

Findings

01

Improved language modeling results with the proposed method.

02

Enhanced training stability and reduced overfitting.

03

Complementary to existing regularization and optimization techniques.

Abstract

Recurrent Neural Networks (RNNs) achieve state-of-the-art results in many sequence-to-sequence modeling tasks. However, RNNs are difficult to train and tend to suffer from overfitting. Motivated by the Data Processing Inequality (DPI), we formulate the multi-layered network as a Markov chain, introducing a training method that comprises training the network gradually and using layer-wise gradient clipping. We found that applying our methods, combined with previously introduced regularization and optimization methods, resulted in improvements in state-of-the-art architectures operating in language modeling tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zivaharoni/gradual-learning-rnn
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications