Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models
Semih Cayci

TL;DR
This paper provides finite-time convergence and non-asymptotic generalization bounds for stochastic Gauss-Newton methods in overparameterized deep neural networks, highlighting the influence of curvature, batch size, and network size.
Contribution
It offers the first finite-time convergence and generalization bounds for SGN in overparameterized models, with explicit dependencies on key parameters.
Findings
Larger minimum eigenvalue of Gauss-Newton matrix improves generalization.
Explicit bounds depend on batch size, network width, and depth.
Identifies a favorable regime where SGN generalizes well.
Abstract
An important question in deep learning is how higher-order optimization methods affect generalization. In this work, we analyze a stochastic Gauss-Newton (SGN) method with Levenberg-Marquardt damping and mini-batch sampling for training overparameterized deep neural networks with smooth activations in a regression setting. Our theoretical contributions are twofold. First, we establish finite-time convergence bounds via a variable-metric analysis in parameter space, with explicit dependencies on the batch size, network width and depth. Second, we derive non-asymptotic generalization bounds for SGN using uniform stability in the overparameterized regime, characterizing the impact of curvature, batch size, and overparameterization on generalization performance. Our theoretical results identify a favorable generalization regime for SGN in which a larger minimum eigenvalue of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Tensor decomposition and applications
