Training Very Deep Networks
Rupesh Kumar Srivastava, Klaus Greff, J\"urgen Schmidhuber

TL;DR
This paper introduces highway networks, a new deep learning architecture inspired by LSTM, that enables training of very deep networks with hundreds of layers using simple gradient descent, overcoming previous training difficulties.
Contribution
The paper proposes highway networks with adaptive gating units, allowing effective training of extremely deep neural networks directly with gradient descent.
Findings
Able to train networks with hundreds of layers
Achieves effective information flow across many layers
Facilitates study of very deep architectures
Abstract
Theoretical and empirical evidence indicates that the depth of neural networks is crucial for their success. However, training becomes more difficult as depth increases, and training of very deep networks remains an open problem. Here we introduce a new architecture designed to overcome this. Our so-called highway networks allow unimpeded information flow across many layers on information highways. They are inspired by Long Short-Term Memory recurrent networks and use adaptive gating units to regulate the information flow. Even with hundreds of layers, highway networks can be trained directly through simple gradient descent. This enables the study of extremely deep and efficient architectures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsHighway networks · Branch attention
