Training Very Deep Networks

Rupesh Kumar Srivastava; Klaus Greff; J\"urgen Schmidhuber

arXiv:1507.06228·cs.LG·November 24, 2015·546 cites

Training Very Deep Networks

Rupesh Kumar Srivastava, Klaus Greff, J\"urgen Schmidhuber

PDF

Open Access 3 Repos

TL;DR

This paper introduces highway networks, a new deep learning architecture inspired by LSTM, that enables training of very deep networks with hundreds of layers using simple gradient descent, overcoming previous training difficulties.

Contribution

The paper proposes highway networks with adaptive gating units, allowing effective training of extremely deep neural networks directly with gradient descent.

Findings

01

Able to train networks with hundreds of layers

02

Achieves effective information flow across many layers

03

Facilitates study of very deep architectures

Abstract

Theoretical and empirical evidence indicates that the depth of neural networks is crucial for their success. However, training becomes more difficult as depth increases, and training of very deep networks remains an open problem. Here we introduce a new architecture designed to overcome this. Our so-called highway networks allow unimpeded information flow across many layers on information highways. They are inspired by Long Short-Term Memory recurrent networks and use adaptive gating units to regulate the information flow. Even with hundreds of layers, highway networks can be trained directly through simple gradient descent. This enables the study of extremely deep and efficient architectures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsHighway networks · Branch attention