Training Recurrent Neural Networks by Diffusion

Hossein Mobahi

arXiv:1601.04114·cs.LG·February 8, 2016·26 cites

Training Recurrent Neural Networks by Diffusion

Hossein Mobahi

PDF

Open Access

TL;DR

This paper introduces a novel diffusion-based algorithm for training recurrent neural networks that naturally incorporates various deep learning techniques and achieves comparable accuracy to SGD with fewer epochs.

Contribution

The work derives a new training algorithm from nonconvex optimization theory that automatically integrates mechanisms like initialization, learning rate schedules, and noise injection.

Findings

01

Achieves similar generalization accuracy to SGD

02

Requires fewer training epochs

03

Provides a theoretical framework linking deep learning practices

Abstract

This work presents a new algorithm for training recurrent neural networks (although ideas are applicable to feedforward networks as well). The algorithm is derived from a theory in nonconvex optimization related to the diffusion equation. The contributions made in this work are two fold. First, we show how some seemingly disconnected mechanisms used in deep learning such as smart initialization, annealed learning rate, layerwise pretraining, and noise injection (as done in dropout and SGD) arise naturally and automatically from this framework, without manually crafting them into the algorithms. Second, we present some preliminary results on comparing the proposed method against SGD. It turns out that the new algorithm can achieve similar level of generalization accuracy of SGD in much fewer number of epochs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Stochastic Gradient Optimization Techniques

MethodsDropout · Stochastic Gradient Descent