Swapout: Learning an ensemble of deep architectures

Saurabh Singh; Derek Hoiem; David Forsyth

arXiv:1605.06465·cs.CV·May 23, 2016·105 cites

Swapout: Learning an ensemble of deep architectures

Saurabh Singh, Derek Hoiem, David Forsyth

PDF

Open Access

TL;DR

Swapout is a stochastic training method that enhances neural network regularization and ensemble diversity, leading to improved accuracy on CIFAR datasets by sampling a broad set of architectures including residual, dropout, and stochastic depth variants.

Contribution

The paper introduces Swapout, a novel stochastic training technique that unifies and extends existing regularization methods and ensemble architectures, achieving state-of-the-art results.

Findings

01

Outperforms ResNets of similar structure on CIFAR-10 and CIFAR-100.

02

Achieves accuracy comparable to much deeper ResNet models.

03

Provides a new parameterization linking to existing architectures.

Abstract

We describe Swapout, a new stochastic training method, that outperforms ResNets of identical network structure yielding impressive results on CIFAR-10 and CIFAR-100. Swapout samples from a rich set of architectures including dropout, stochastic depth and residual architectures as special cases. When viewed as a regularization method swapout not only inhibits co-adaptation of units in a layer, similar to dropout, but also across network layers. We conjecture that swapout achieves strong regularization by implicitly tying the parameters across layers. When viewed as an ensemble training method, it samples a much richer set of architectures than existing methods such as dropout or stochastic depth. We propose a parameterization that reveals connections to exiting architectures and suggests a much richer set of architectures to be explored. We show that our formulation suggests an efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Stochastic Depth