On the interplay of network structure and gradient convergence in deep learning
Vamsi K Ithapu, Sathya N Ravi, Vikas Singh

TL;DR
This paper investigates how network structure, data statistics, and regularization techniques influence the convergence behavior of backpropagation in deep learning, providing a framework for guiding parameter and architecture choices.
Contribution
It introduces a theoretical framework linking network structure, data properties, and convergence rates, with insights on feature denoising and dropout effects in deep networks.
Findings
Relationship between feature denoising and dropout elucidated
Guidelines for selecting learning parameters based on input data statistics
Experimental validation supports theoretical insights
Abstract
The regularization and output consistency behavior of dropout and layer-wise pretraining for learning deep networks have been fairly well studied. However, our understanding of how the asymptotic convergence of backpropagation in deep architectures is related to the structural properties of the network and other design choices (like denoising and dropout rate) is less clear at this time. An interesting question one may ask is whether the network architecture and input data statistics may guide the choices of learning parameters and vice versa. In this work, we explore the association between such structural, distributional and learnability aspects vis-\`a-vis their interaction with parameter convergence rates. We present a framework to address these questions based on convergence of backpropagation for general nonconvex objectives using first-order information. This analysis suggests an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDropout
