Swapout: Learning an ensemble of deep architectures
Saurabh Singh, Derek Hoiem, David Forsyth

TL;DR
Swapout is a stochastic training method that enhances neural network regularization and ensemble diversity, leading to improved accuracy on CIFAR datasets by sampling a broad set of architectures including residual, dropout, and stochastic depth variants.
Contribution
The paper introduces Swapout, a novel stochastic training technique that unifies and extends existing regularization methods and ensemble architectures, achieving state-of-the-art results.
Findings
Outperforms ResNets of similar structure on CIFAR-10 and CIFAR-100.
Achieves accuracy comparable to much deeper ResNet models.
Provides a new parameterization linking to existing architectures.
Abstract
We describe Swapout, a new stochastic training method, that outperforms ResNets of identical network structure yielding impressive results on CIFAR-10 and CIFAR-100. Swapout samples from a rich set of architectures including dropout, stochastic depth and residual architectures as special cases. When viewed as a regularization method swapout not only inhibits co-adaptation of units in a layer, similar to dropout, but also across network layers. We conjecture that swapout achieves strong regularization by implicitly tying the parameters across layers. When viewed as an ensemble training method, it samples a much richer set of architectures than existing methods such as dropout or stochastic depth. We propose a parameterization that reveals connections to exiting architectures and suggests a much richer set of architectures to be explored. We show that our formulation suggests an efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Stochastic Depth
