Shake-Shake regularization

Xavier Gastaldi

arXiv:1705.07485·cs.LG·May 24, 2017·313 cites

Shake-Shake regularization

Xavier Gastaldi

PDF

Open Access 5 Repos

TL;DR

Shake-Shake regularization introduces stochastic affine combinations in multi-branch networks to combat overfitting, significantly improving test accuracy on CIFAR datasets and demonstrating versatility across architectures.

Contribution

This paper proposes shake-shake regularization, a novel stochastic method replacing summation in multi-branch networks, leading to state-of-the-art results and broad applicability.

Findings

01

Achieved 2.86% error on CIFAR-10

02

Reached 15.85% error on CIFAR-100

03

Effective even without skip connections or Batch Normalization

Abstract

The method introduced in this paper aims at helping deep learning practitioners faced with an overfit problem. The idea is to replace, in a multi-branch network, the standard summation of parallel branches with a stochastic affine combination. Applied to 3-branch residual networks, shake-shake regularization improves on the best single shot published results on CIFAR-10 and CIFAR-100 by reaching test errors of 2.86% and 15.85%. Experiments on architectures without skip connections or Batch Normalization show encouraging results and open the door to a large set of applications. Code is available at https://github.com/xgastaldi/shake-shake

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Integrated Circuits and Semiconductor Failure Analysis

MethodsShake-Shake Regularization