Training Recurrent Neural Networks by Diffusion
Hossein Mobahi

TL;DR
This paper introduces a novel diffusion-based algorithm for training recurrent neural networks that naturally incorporates various deep learning techniques and achieves comparable accuracy to SGD with fewer epochs.
Contribution
The work derives a new training algorithm from nonconvex optimization theory that automatically integrates mechanisms like initialization, learning rate schedules, and noise injection.
Findings
Achieves similar generalization accuracy to SGD
Requires fewer training epochs
Provides a theoretical framework linking deep learning practices
Abstract
This work presents a new algorithm for training recurrent neural networks (although ideas are applicable to feedforward networks as well). The algorithm is derived from a theory in nonconvex optimization related to the diffusion equation. The contributions made in this work are two fold. First, we show how some seemingly disconnected mechanisms used in deep learning such as smart initialization, annealed learning rate, layerwise pretraining, and noise injection (as done in dropout and SGD) arise naturally and automatically from this framework, without manually crafting them into the algorithms. Second, we present some preliminary results on comparing the proposed method against SGD. It turns out that the new algorithm can achieve similar level of generalization accuracy of SGD in much fewer number of epochs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
MethodsDropout · Stochastic Gradient Descent
