On the Lipschitz Constant of Deep Networks and Double Descent
Matteo Gamba, Hossein Azizpour, M{\aa}rten Bj\"orkman

TL;DR
This paper investigates how the Lipschitz constant of deep networks relates to double descent phenomena, revealing non-monotonic trends and factors like loss landscape curvature and parameter distance that influence generalization.
Contribution
It provides an extensive experimental analysis linking empirical Lipschitz constants with double descent, and introduces a connection between parameter-space and input-space gradients.
Findings
Lipschitz constant exhibits non-monotonic behavior correlated with test error.
Loss landscape curvature impacts optimization dynamics.
Distance from initialization bounds model complexity.
Abstract
Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable, falling short of investigating the mechanisms controlling such factors in practice. In this work, we present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent, and highlight non-monotonic trends strongly correlating with the test error. Building a connection between parameter-space and input-space gradients for SGD around a critical point, we isolate two important factors -- namely loss landscape curvature and distance of parameters from initialization -- respectively controlling optimization dynamics around a critical point and bounding model function complexity, even beyond the training data. Our study presents novels insights on implicit regularization via overparameterization, and effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Gaussian Processes and Bayesian Inference
MethodsTest · Stochastic Gradient Descent
