Shake-Shake regularization
Xavier Gastaldi

TL;DR
Shake-Shake regularization introduces stochastic affine combinations in multi-branch networks to combat overfitting, significantly improving test accuracy on CIFAR datasets and demonstrating versatility across architectures.
Contribution
This paper proposes shake-shake regularization, a novel stochastic method replacing summation in multi-branch networks, leading to state-of-the-art results and broad applicability.
Findings
Achieved 2.86% error on CIFAR-10
Reached 15.85% error on CIFAR-100
Effective even without skip connections or Batch Normalization
Abstract
The method introduced in this paper aims at helping deep learning practitioners faced with an overfit problem. The idea is to replace, in a multi-branch network, the standard summation of parallel branches with a stochastic affine combination. Applied to 3-branch residual networks, shake-shake regularization improves on the best single shot published results on CIFAR-10 and CIFAR-100 by reaching test errors of 2.86% and 15.85%. Experiments on architectures without skip connections or Batch Normalization show encouraging results and open the door to a large set of applications. Code is available at https://github.com/xgastaldi/shake-shake
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Integrated Circuits and Semiconductor Failure Analysis
MethodsShake-Shake Regularization
