Residual Networks Behave Like Boosting Algorithms
Chapman Siu

TL;DR
This paper reveals that Residual Networks function similarly to boosting algorithms, providing theoretical bounds and proposing novel boosting-inspired neural network and decision tree methods with competitive performance.
Contribution
It establishes a theoretical equivalence between ResNet and boosting, introduces architectural modifications for boosting, and develops new boosting-inspired algorithms with proven error bounds.
Findings
ResNet is equivalent to boosting feature representation.
Proposed residual modules with max-norm bounds improve boosting.
Online boosted decision trees perform comparably to state-of-the-art offline methods.
Abstract
We show that Residual Networks (ResNet) is equivalent to boosting feature representation, without any modification to the underlying ResNet training algorithm. A regret bound based on Online Gradient Boosting theory is proved and suggests that ResNet could achieve Online Gradient Boosting regret bounds through neural network architectural changes with the addition of a shrinkage parameter in the identity skip-connections and using residual modules with max-norm bounds. Through this relation between ResNet and Online Boosting, novel feature representation boosting algorithms can be constructed based on altering residual modules. We demonstrate this through proposing decision tree residual modules to construct a new boosted decision tree algorithm and demonstrating generalization error bounds for both approaches; relaxing constraints within BoostResNet algorithm to allow it to be trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection
