Learning Deep ResNet Blocks Sequentially using Boosting Theory

Furong Huang; Jordan Ash; John Langford; Robert Schapire

arXiv:1706.04964·cs.LG·June 15, 2018·31 cites

Learning Deep ResNet Blocks Sequentially using Boosting Theory

Furong Huang, Jordan Ash, John Langford, Robert Schapire

PDF

Open Access

TL;DR

This paper introduces BoostResNet, a boosting-based training method for deep ResNets that constructs the network from sequentially trained shallow modules, with theoretical guarantees on error decay and generalization.

Contribution

It develops a boosting theory for ResNet architectures, enabling a new training algorithm suitable for non-differentiable models and providing theoretical error and generalization bounds.

Findings

01

Training error decays exponentially with depth T.

02

BoostResNet performs well under weak learning conditions.

03

ResNet's generalization is resistant to overfitting with bounded weights.

Abstract

Deep neural networks are known to be difficult to train due to the instability of back-propagation. A deep \emph{residual network} (ResNet) with identity loops remedies this by stabilizing gradient computations. We prove a boosting theory for the ResNet architecture. We construct $T$ weak module classifiers, each contains two of the $T$ layers, such that the combined strong learner is a ResNet. Therefore, we introduce an alternative Deep ResNet training algorithm, \emph{BoostResNet}, which is particularly suitable in non-differentiable architectures. Our proposed algorithm merely requires a sequential training of $T$ "shallow ResNets" which are inexpensive. We prove that the training error decays exponentially with the depth $T$ if the \emph{weak module classifiers} that we train perform slightly better than some weak baseline. In other words, we propose a weak learning condition and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques

MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection