Learning Non-Vacuous Generalization Bounds from Optimization
Chengli Tan, Jiangshe Zhang, Junmin Liu

TL;DR
This paper introduces a new approach to derive non-vacuous, tight generalization bounds for deep neural networks by modeling the training process with stochastic differential equations, providing plausible guarantees for large-scale models.
Contribution
It presents a novel method leveraging fractal-like hypothesis sets and continuous-time stochastic modeling to obtain meaningful generalization bounds for modern neural networks.
Findings
Provides plausible generalization guarantees for ResNet and Vision Transformer.
Achieves tighter bounds over algorithm-dependent Rademacher complexity.
Demonstrates effectiveness on large-scale datasets like ImageNet-1K.
Abstract
One of the fundamental challenges in the deep learning community is to theoretically understand how well a deep neural network generalizes to unseen data. However, current approaches often yield generalization bounds that are either too loose to be informative of the true generalization error or only valid to the compressed nets. In this study, we present a simple yet non-vacuous generalization bound from the optimization perspective. We achieve this goal by leveraging that the hypothesis set accessed by stochastic gradient algorithms is essentially fractal-like and thus can derive a tighter bound over the algorithm-dependent Rademacher complexity. The main argument rests on modeling the discrete-time recursion process via a continuous-time stochastic differential equation driven by fractional Brownian motion. Numerical studies demonstrate that our approach is able to yield plausible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Model Reduction and Neural Networks
MethodsStochastic Gradient Descent
