TL;DR
This paper provides a theoretical analysis of structured dropout methods like DropBlock and DropConnect, revealing their regularization effects and connections to norm-based regularizers, and extends some results to deep nonlinear networks.
Contribution
It characterizes the regularization properties of DropBlock and DropConnect, showing their equivalence to spectral k-support norm regularization and deriving closed-form solutions for their minimizers.
Findings
DropBlock induces spectral k-support norm regularization.
DropConnect is equivalent to Dropout under certain conditions.
Theoretical results are validated with experiments on common architectures.
Abstract
Dropout and its extensions (eg. DropBlock and DropConnect) are popular heuristics for training neural networks, which have been shown to improve generalization performance in practice. However, a theoretical understanding of their optimization and regularization properties remains elusive. Recent work shows that in the case of single hidden-layer linear networks, Dropout is a stochastic gradient descent method for minimizing a regularized loss, and that the regularizer induces solutions that are low-rank and balanced. In this work we show that for single hidden-layer linear networks, DropBlock induces spectral k-support norm regularization, and promotes solutions that are low-rank and have factors with equal norm. We also show that the global minimizer for DropBlock can be computed in closed form, and that DropConnect is equivalent to Dropout. We then show that some of these results can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
On the Regularization Properties of Structured Dropout· youtube
Taxonomy
MethodsDropBlock · DropConnect · Dropout
