The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and   Regularization

Daniel LeJeune; Hamid Javadi; Richard G. Baraniuk

arXiv:2106.07769·cs.LG·January 4, 2022·1 cites

The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization

Daniel LeJeune, Hamid Javadi, Richard G. Baraniuk

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper reveals a duality between adaptive dropout methods and regularization in neural networks, showing that adaptive masking strategies correspond to specific regularization penalties that promote sparsity, supported by theoretical analysis and empirical validation.

Contribution

It introduces a duality framework linking adaptive dropout to regularization penalties, providing a theoretical basis for understanding sparsification in deep networks.

Findings

01

Adaptive dropout strategies correspond to subquadratic regularization penalties.

02

Effective penalties for popular sparsification methods resemble classical sparse optimization penalties.

03

Empirical results show similar behavior between adaptive dropout and classical regularization methods.

Abstract

Among the most successful methods for sparsifying deep (neural) networks are those that adaptively mask the network weights throughout training. By examining this masking, or dropout, in the linear case, we uncover a duality between such adaptive methods and regularization through the so-called " $η$ -trick" that casts both as iteratively reweighted optimizations. We show that any dropout strategy that adapts to the weights in a monotonic way corresponds to an effective subquadratic regularization penalty, and therefore leads to sparse solutions. We obtain the effective penalties for several popular sparsification strategies, which are remarkably similar to classical penalties commonly used in sparse optimization. Considering variational dropout as a case study, we demonstrate similar empirical behavior between the adaptive dropout method and classical methods on the task of deep…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dlej/adaptive-dropout
pytorchOfficial

Videos

The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization· slideslive

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning

MethodsAdaptive Dropout · Variational Dropout · Dropout