Asymptotic convergence rate of Dropout on shallow linear neural networks

Albert Senen-Cerda; Jaron Sanders

arXiv:2012.01978·cs.LG·December 4, 2020

Asymptotic convergence rate of Dropout on shallow linear neural networks

Albert Senen-Cerda, Jaron Sanders

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of the convergence rate of gradient flows in shallow linear neural networks trained with Dropout, revealing how factors like data, dropout probability, and network width influence convergence.

Contribution

It offers the first local convergence proof and explicit rate bounds for Dropout in shallow linear neural networks, connecting theory with numerical experiments.

Findings

01

Convergence rate depends on data, dropout probability, and network width.

02

Theoretical bounds align with numerical simulations when initialized near a minimizer.

03

Analysis leverages recent nonconvex optimization results and Hessian properties.

Abstract

We analyze the convergence rate of gradient flows on objective functions induced by Dropout and Dropconnect, when applying them to shallow linear Neural Networks (NNs) - which can also be viewed as doing matrix factorization using a particular regularizer. Dropout algorithms such as these are thus regularization techniques that use 0,1-valued random variables to filter weights during training in order to avoid coadaptation of features. By leveraging a recent result on nonconvex optimization and conducting a careful analysis of the set of minimizers as well as the Hessian of the loss function, we are able to obtain (i) a local convergence proof of the gradient flow and (ii) a bound on the convergence rate that depends on the data, the dropout probability, and the width of the NN. Finally, we compare this theoretical bound to numerical simulations, which are in qualitative agreement with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques

MethodsDropout