On architectural choices in deep learning: From network structure to   gradient convergence and parameter estimation

Vamsi K Ithapu; Sathya N Ravi; Vikas Singh

arXiv:1702.08670·cs.LG·March 2, 2017·5 cites

On architectural choices in deep learning: From network structure to gradient convergence and parameter estimation

Vamsi K Ithapu, Sathya N Ravi, Vikas Singh

PDF

Open Access

TL;DR

This paper investigates how deep network architecture, input data, and design choices like activation, denoising, and dropout influence convergence and learnability, providing a systematic framework for guiding network design and parameter selection.

Contribution

It introduces a theoretical framework linking network structure, data statistics, and convergence, offering systematic guidance for architecture and parameter choices in deep learning.

Findings

01

Relationship between feature denoising and dropout on convergence

02

Networks with different structures can achieve similar convergence levels

03

Workflow for selecting network sizes and learning parameters based on input statistics

Abstract

We study mechanisms to characterize how the asymptotic convergence of backpropagation in deep architectures, in general, is related to the network structure, and how it may be influenced by other design choices including activation type, denoising and dropout rate. We seek to analyze whether network architecture and input data statistics may guide the choices of learning parameters and vice versa. Given the broad applicability of deep architectures, this issue is interesting both from theoretical and a practical standpoint. Using properties of general nonconvex objectives (with first-order information), we first build the association between structural, distributional and learnability aspects of the network vis-\`a-vis their interaction with parameter convergence rates. We identify a nice relationship between feature denoising and dropout, and construct families of networks that achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms

MethodsAffine Coupling · Normalizing Flows · Dropout