Highway Networks
Rupesh Kumar Srivastava, Klaus Greff, J\"urgen Schmidhuber

TL;DR
Highway networks introduce gating units that facilitate training of very deep neural networks by enabling unimpeded information flow, allowing for the effective training of hundreds of layers.
Contribution
The paper presents a novel highway network architecture with gating units that ease the training of extremely deep networks using standard gradient descent.
Findings
Can be trained with hundreds of layers
Effective with various activation functions
Facilitates gradient flow in deep networks
Abstract
There is plenty of theoretical and empirical evidence that depth of neural networks is a crucial ingredient for their success. However, network training becomes more difficult with increasing depth and training of very deep networks remains an open problem. In this extended abstract, we introduce a new architecture designed to ease gradient-based training of very deep networks. We refer to networks with this architecture as highway networks, since they allow unimpeded information flow across several layers on "information highways". The architecture is characterized by the use of gating units which learn to regulate the flow of information through a network. Highway networks with hundreds of layers can be trained directly using stochastic gradient descent and with a variety of activation functions, opening up the possibility of studying extremely deep and efficient architectures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning
MethodsHighway networks · Sigmoid Activation · Highway Network · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · SGD with Momentum · Highway Layer
