ShakeDrop Regularization for Deep Residual Learning

Yoshihiro Yamada; Masakazu Iwamura; Takuya Akiba; Koichi Kise

arXiv:1802.02375·cs.CV·April 1, 2020

ShakeDrop Regularization for Deep Residual Learning

Yoshihiro Yamada, Masakazu Iwamura, Takuya Akiba, Koichi Kise

PDF

5 Repos

TL;DR

This paper introduces ShakeDrop, a new regularization technique for deep residual networks that reduces overfitting and improves training stability across various architectures like ResNet, Wide ResNet, PyramidNet, and ResNeXt.

Contribution

ShakeDrop extends Shake-Shake regularization to a broader range of residual networks and includes a training stabilizer to ensure effective and stable regularization.

Findings

01

ShakeDrop outperforms Shake-Shake in regularization effectiveness.

02

ShakeDrop is applicable to multiple residual network architectures.

03

Training stabilizer ensures stable and effective regularization.

Abstract

Overfitting is a crucial problem in deep neural networks, even in the latest network architectures. In this paper, to relieve the overfitting effect of ResNet and its improvements (i.e., Wide ResNet, PyramidNet, and ResNeXt), we propose a new regularization method called ShakeDrop regularization. ShakeDrop is inspired by Shake-Shake, which is an effective regularization method, but can be applied to ResNeXt only. ShakeDrop is more effective than Shake-Shake and can be applied not only to ResNeXt but also ResNet, Wide ResNet, and PyramidNet. An important key is to achieve stability of training. Because effective regularization often causes unstable training, we introduce a training stabilizer, which is an unusual use of an existing regularizer. Through experiments under various conditions, we demonstrate the conditions under which ShakeDrop works well.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDropout · Wide Residual Block · WideResNet · Average Pooling · ResNeXt Block · Zero-padded Shortcut Connection · Pyramidal Residual Unit · Pyramidal Bottleneck Residual Unit · PyramidNet · Grouped Convolution