TL;DR
This paper introduces ShakeDrop, a new regularization technique for deep residual networks that reduces overfitting and improves training stability across various architectures like ResNet, Wide ResNet, PyramidNet, and ResNeXt.
Contribution
ShakeDrop extends Shake-Shake regularization to a broader range of residual networks and includes a training stabilizer to ensure effective and stable regularization.
Findings
ShakeDrop outperforms Shake-Shake in regularization effectiveness.
ShakeDrop is applicable to multiple residual network architectures.
Training stabilizer ensures stable and effective regularization.
Abstract
Overfitting is a crucial problem in deep neural networks, even in the latest network architectures. In this paper, to relieve the overfitting effect of ResNet and its improvements (i.e., Wide ResNet, PyramidNet, and ResNeXt), we propose a new regularization method called ShakeDrop regularization. ShakeDrop is inspired by Shake-Shake, which is an effective regularization method, but can be applied to ResNeXt only. ShakeDrop is more effective than Shake-Shake and can be applied not only to ResNeXt but also ResNet, Wide ResNet, and PyramidNet. An important key is to achieve stability of training. Because effective regularization often causes unstable training, we introduce a training stabilizer, which is an unusual use of an existing regularizer. Through experiments under various conditions, we demonstrate the conditions under which ShakeDrop works well.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDropout · Wide Residual Block · WideResNet · Average Pooling · ResNeXt Block · Zero-padded Shortcut Connection · Pyramidal Residual Unit · Pyramidal Bottleneck Residual Unit · PyramidNet · Grouped Convolution
