On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization
Sanjeev Arora, Nadav Cohen, Elad Hazan

TL;DR
This paper demonstrates that increasing depth in overparameterized linear neural networks can implicitly accelerate optimization, acting as a preconditioner and improving convergence rates, even in simple convex problems.
Contribution
It reveals that overparameterization through depth can accelerate optimization independently of expressiveness, supported by theoretical analysis and experiments.
Findings
Depth acts as a preconditioner for faster convergence.
Overparameterization benefits gradient descent beyond traditional acceleration methods.
Acceleration effects cannot be replicated by regularizer gradients.
Abstract
Conventional wisdom in deep learning states that increasing depth improves expressiveness but complicates optimization. This paper suggests that, sometimes, increasing depth can speed up optimization. The effect of depth on optimization is decoupled from expressiveness by focusing on settings where additional layers amount to overparameterization - linear neural networks, a well-studied model. Theoretical analysis, as well as experiments, show that here depth acts as a preconditioner which may accelerate convergence. Even on simple convex problems such as linear regression with loss, , gradient descent can benefit from transitioning to a non-convex overparameterized objective, more than it would from some common acceleration schemes. We also prove that it is mathematically impossible to obtain the acceleration effect of overparametrization via gradients of any regularizer.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Linear Regression
