Weighted Residuals for Very Deep Networks

Falong Shen; Gang Zeng

arXiv:1605.08831·cs.CV·May 31, 2016

Weighted Residuals for Very Deep Networks

Falong Shen, Gang Zeng

PDF

TL;DR

This paper introduces weighted residual networks that improve convergence and accuracy in very deep networks, addressing issues in original residual structures with minimal additional computational cost.

Contribution

The paper proposes a weighted residual network architecture that effectively combines residuals and enhances training of networks over 1000 layers.

Findings

01

Faster convergence speed on CIFAR-10.

02

Achieved 95.3% accuracy with 1192-layer model.

03

Improved performance with minimal extra computation.

Abstract

Deep residual networks have recently shown appealing performance on many challenging computer vision tasks. However, the original residual structure still has some defects making it difficult to converge on very deep networks. In this paper, we introduce a weighted residual network to address the incompatibility between \texttt{ReLU} and element-wise addition and the deep network initialization problem. The weighted residual network is able to learn to combine residuals from different layers effectively and efficiently. The proposed models enjoy a consistent improvement over accuracy and convergence with increasing depths from 100+ layers to 1000+ layers. Besides, the weighted residual networks have little more computation and GPU memory burden than the original residual networks. The networks are optimized by projected stochastic gradient descent. Experiments on CIFAR-10 have shown…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.