Implicit Bias of Gradient Descent on Linear Convolutional Networks

Suriya Gunasekar; Jason Lee; Daniel Soudry; Nathan Srebro

arXiv:1806.00468·cs.LG·January 14, 2019·39 cites

Implicit Bias of Gradient Descent on Linear Convolutional Networks

Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro

PDF

Open Access

TL;DR

This paper investigates how gradient descent behaves on linear convolutional networks, revealing that it converges to solutions related to a specific frequency domain penalty, differing from fully connected networks.

Contribution

It demonstrates that gradient descent on convolutional networks converges to a frequency domain penalty, contrasting with fully connected networks' convergence to SVM solutions.

Findings

01

Gradient descent on convolutional networks relates to the $\, ext{l}_{2/L}$ bridge penalty in frequency domain.

02

Convergence behavior differs significantly between convolutional and fully connected networks.

03

The result provides insight into the implicit bias of convolutional architectures.

Abstract

We show that gradient descent on full-width linear convolutional networks of depth $L$ converges to a linear predictor related to the $ℓ_{2/ L}$ bridge penalty in the frequency domain. This is in contrast to linearly fully connected networks, where gradient descent converges to the hard margin linear support vector machine solution, regardless of depth.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Matrix Theory and Algorithms