Weighted Residuals for Very Deep Networks
Falong Shen, Gang Zeng

TL;DR
This paper introduces weighted residual networks that improve convergence and accuracy in very deep networks, addressing issues in original residual structures with minimal additional computational cost.
Contribution
The paper proposes a weighted residual network architecture that effectively combines residuals and enhances training of networks over 1000 layers.
Findings
Faster convergence speed on CIFAR-10.
Achieved 95.3% accuracy with 1192-layer model.
Improved performance with minimal extra computation.
Abstract
Deep residual networks have recently shown appealing performance on many challenging computer vision tasks. However, the original residual structure still has some defects making it difficult to converge on very deep networks. In this paper, we introduce a weighted residual network to address the incompatibility between \texttt{ReLU} and element-wise addition and the deep network initialization problem. The weighted residual network is able to learn to combine residuals from different layers effectively and efficiently. The proposed models enjoy a consistent improvement over accuracy and convergence with increasing depths from 100+ layers to 1000+ layers. Besides, the weighted residual networks have little more computation and GPU memory burden than the original residual networks. The networks are optimized by projected stochastic gradient descent. Experiments on CIFAR-10 have shown…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
