Parallel Dither and Dropout for Regularising Deep Neural Networks
Andrew J.R. Simpson

TL;DR
This paper introduces a new parallel regularisation method for deep neural networks that works without batch averaging, outperforming traditional batch-averaged methods and showing that dither and dropout can be combined effectively.
Contribution
A novel parallel regularisation technique for non-batch SGD that enhances deep neural network training and demonstrates the complementary nature of dither and dropout.
Findings
Parallel regularisation outperforms batch-averaged methods
Dither and dropout are complementary techniques
Non-batch SGD with parallel regularisation yields better results
Abstract
Effective regularisation during training can mean the difference between success and failure for deep neural networks. Recently, dither has been suggested as alternative to dropout for regularisation during batch-averaged stochastic gradient descent (SGD). In this article, we show that these methods fail without batch averaging and we introduce a new, parallel regularisation method that may be used without batch averaging. Our results for parallel-regularised non-batch-SGD are substantially better than what is possible with batch-SGD. Furthermore, our results demonstrate that dither and dropout are complimentary.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
MethodsDropout
