AutoDropout: Learning Dropout Patterns to Regularize Deep Networks

Hieu Pham; Quoc V. Le

arXiv:2101.01761·cs.LG·January 7, 2021

AutoDropout: Learning Dropout Patterns to Regularize Deep Networks

Hieu Pham, Quoc V. Le

PDF

1 Repo

TL;DR

This paper introduces AutoDropout, a method where a controller learns optimal dropout patterns for neural networks, improving regularization by leveraging learned structures, and demonstrating effectiveness across vision and language tasks.

Contribution

It proposes a novel learnable dropout pattern mechanism that adapts to network layers and tasks, outperforming fixed pattern methods.

Findings

01

Effective on CIFAR-10 and ImageNet image recognition tasks.

02

Improves language modeling on Penn Treebank and WikiText-2.

03

Transferable dropout patterns across different tasks and datasets.

Abstract

Neural networks are often over-parameterized and hence benefit from aggressive regularization. Conventional regularization methods, such as Dropout or weight decay, do not leverage the structures of the network's inputs and hidden states. As a result, these conventional methods are less effective than methods that leverage the structures, such as SpatialDropout and DropBlock, which randomly drop the values at certain contiguous areas in the hidden states and setting them to zero. Although the locations of dropout areas random, the patterns of SpatialDropout and DropBlock are manually designed and fixed. Here we propose to learn the dropout patterns. In our method, a controller learns to generate a dropout pattern at every channel and layer of a target network, such as a ConvNet or a Transformer. The target network is then trained with the dropout pattern, and its resulting validation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/google-research
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · AutoDropout · Softmax · Byte Pair Encoding · Dense Connections · Label Smoothing · Attention Is All You Need · Multi-Head Attention