Implicit Bias of Gradient Descent on Linear Convolutional Networks
Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro

TL;DR
This paper investigates how gradient descent behaves on linear convolutional networks, revealing that it converges to solutions related to a specific frequency domain penalty, differing from fully connected networks.
Contribution
It demonstrates that gradient descent on convolutional networks converges to a frequency domain penalty, contrasting with fully connected networks' convergence to SVM solutions.
Findings
Gradient descent on convolutional networks relates to the $\, ext{l}_{2/L}$ bridge penalty in frequency domain.
Convergence behavior differs significantly between convolutional and fully connected networks.
The result provides insight into the implicit bias of convolutional architectures.
Abstract
We show that gradient descent on full-width linear convolutional networks of depth converges to a linear predictor related to the bridge penalty in the frequency domain. This is in contrast to linearly fully connected networks, where gradient descent converges to the hard margin linear support vector machine solution, regardless of depth.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Matrix Theory and Algorithms
