Parallel Dither and Dropout for Regularising Deep Neural Networks

Andrew J.R. Simpson

arXiv:1508.07130·cs.LG·August 31, 2015·1 cites

Parallel Dither and Dropout for Regularising Deep Neural Networks

Andrew J.R. Simpson

PDF

Open Access

TL;DR

This paper introduces a new parallel regularisation method for deep neural networks that works without batch averaging, outperforming traditional batch-averaged methods and showing that dither and dropout can be combined effectively.

Contribution

A novel parallel regularisation technique for non-batch SGD that enhances deep neural network training and demonstrates the complementary nature of dither and dropout.

Findings

01

Parallel regularisation outperforms batch-averaged methods

02

Dither and dropout are complementary techniques

03

Non-batch SGD with parallel regularisation yields better results

Abstract

Effective regularisation during training can mean the difference between success and failure for deep neural networks. Recently, dither has been suggested as alternative to dropout for regularisation during batch-averaged stochastic gradient descent (SGD). In this article, we show that these methods fail without batch averaging and we introduce a new, parallel regularisation method that may be used without batch averaging. Our results for parallel-regularised non-batch-SGD are substantially better than what is possible with batch-SGD. Furthermore, our results demonstrate that dither and dropout are complimentary.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning

MethodsDropout