On the Optimization of Deep Networks: Implicit Acceleration by   Overparameterization

Sanjeev Arora; Nadav Cohen; Elad Hazan

arXiv:1802.06509·cs.LG·June 12, 2018·136 cites

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

Sanjeev Arora, Nadav Cohen, Elad Hazan

PDF

Open Access

TL;DR

This paper demonstrates that increasing depth in overparameterized linear neural networks can implicitly accelerate optimization, acting as a preconditioner and improving convergence rates, even in simple convex problems.

Contribution

It reveals that overparameterization through depth can accelerate optimization independently of expressiveness, supported by theoretical analysis and experiments.

Findings

01

Depth acts as a preconditioner for faster convergence.

02

Overparameterization benefits gradient descent beyond traditional acceleration methods.

03

Acceleration effects cannot be replicated by regularizer gradients.

Abstract

Conventional wisdom in deep learning states that increasing depth improves expressiveness but complicates optimization. This paper suggests that, sometimes, increasing depth can speed up optimization. The effect of depth on optimization is decoupled from expressiveness by focusing on settings where additional layers amount to overparameterization - linear neural networks, a well-studied model. Theoretical analysis, as well as experiments, show that here depth acts as a preconditioner which may accelerate convergence. Even on simple convex problems such as linear regression with $ℓ_{p}$ loss, $p > 2$ , gradient descent can benefit from transitioning to a non-convex overparameterized objective, more than it would from some common acceleration schemes. We also prove that it is mathematically impossible to obtain the acceleration effect of overparametrization via gradients of any regularizer.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Linear Regression