On the interplay of network structure and gradient convergence in deep   learning

Vamsi K Ithapu; Sathya N Ravi; Vikas Singh

arXiv:1511.05297·cs.LG·February 23, 2017

On the interplay of network structure and gradient convergence in deep learning

Vamsi K Ithapu, Sathya N Ravi, Vikas Singh

PDF

TL;DR

This paper investigates how network structure, data statistics, and regularization techniques influence the convergence behavior of backpropagation in deep learning, providing a framework for guiding parameter and architecture choices.

Contribution

It introduces a theoretical framework linking network structure, data properties, and convergence rates, with insights on feature denoising and dropout effects in deep networks.

Findings

01

Relationship between feature denoising and dropout elucidated

02

Guidelines for selecting learning parameters based on input data statistics

03

Experimental validation supports theoretical insights

Abstract

The regularization and output consistency behavior of dropout and layer-wise pretraining for learning deep networks have been fairly well studied. However, our understanding of how the asymptotic convergence of backpropagation in deep architectures is related to the structural properties of the network and other design choices (like denoising and dropout rate) is less clear at this time. An interesting question one may ask is whether the network architecture and input data statistics may guide the choices of learning parameters and vice versa. In this work, we explore the association between such structural, distributional and learnability aspects vis-\`a-vis their interaction with parameter convergence rates. We present a framework to address these questions based on convergence of backpropagation for general nonconvex objectives using first-order information. This analysis suggests an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDropout