On architectural choices in deep learning: From network structure to gradient convergence and parameter estimation
Vamsi K Ithapu, Sathya N Ravi, Vikas Singh

TL;DR
This paper investigates how deep network architecture, input data, and design choices like activation, denoising, and dropout influence convergence and learnability, providing a systematic framework for guiding network design and parameter selection.
Contribution
It introduces a theoretical framework linking network structure, data statistics, and convergence, offering systematic guidance for architecture and parameter choices in deep learning.
Findings
Relationship between feature denoising and dropout on convergence
Networks with different structures can achieve similar convergence levels
Workflow for selecting network sizes and learning parameters based on input statistics
Abstract
We study mechanisms to characterize how the asymptotic convergence of backpropagation in deep architectures, in general, is related to the network structure, and how it may be influenced by other design choices including activation type, denoising and dropout rate. We seek to analyze whether network architecture and input data statistics may guide the choices of learning parameters and vice versa. Given the broad applicability of deep architectures, this issue is interesting both from theoretical and a practical standpoint. Using properties of general nonconvex objectives (with first-order information), we first build the association between structural, distributional and learnability aspects of the network vis-\`a-vis their interaction with parameter convergence rates. We identify a nice relationship between feature denoising and dropout, and construct families of networks that achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms
MethodsAffine Coupling · Normalizing Flows · Dropout
