Learning Deep ResNet Blocks Sequentially using Boosting Theory
Furong Huang, Jordan Ash, John Langford, Robert Schapire

TL;DR
This paper introduces BoostResNet, a boosting-based training method for deep ResNets that constructs the network from sequentially trained shallow modules, with theoretical guarantees on error decay and generalization.
Contribution
It develops a boosting theory for ResNet architectures, enabling a new training algorithm suitable for non-differentiable models and providing theoretical error and generalization bounds.
Findings
Training error decays exponentially with depth T.
BoostResNet performs well under weak learning conditions.
ResNet's generalization is resistant to overfitting with bounded weights.
Abstract
Deep neural networks are known to be difficult to train due to the instability of back-propagation. A deep \emph{residual network} (ResNet) with identity loops remedies this by stabilizing gradient computations. We prove a boosting theory for the ResNet architecture. We construct weak module classifiers, each contains two of the layers, such that the combined strong learner is a ResNet. Therefore, we introduce an alternative Deep ResNet training algorithm, \emph{BoostResNet}, which is particularly suitable in non-differentiable architectures. Our proposed algorithm merely requires a sequential training of "shallow ResNets" which are inexpensive. We prove that the training error decays exponentially with the depth if the \emph{weak module classifiers} that we train perform slightly better than some weak baseline. In other words, we propose a weak learning condition and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques
MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection
