A Survey on Data-Dependent Worst-Case Generalization Bounds
Hubert Leroux, Jean Marcus, Julien Roger

TL;DR
This survey reviews recent advances in data-dependent generalization bounds for deep neural networks, focusing on trajectory-based complexity measures and stability assumptions to explain their good generalization despite overparameterization.
Contribution
It organizes and compares recent theoretical developments in data-dependent bounds, unifying them through a common framework and highlighting their differences.
Findings
Trajectory-based complexity measures improve bounds.
Stability assumptions can replace information-theoretic terms.
Unified template inequality connects various approaches.
Abstract
Deep neural networks generalize well despite being heavily overparameterized, in apparent contradiction with classical learning theory based on uniform convergence over fixed hypothesis spaces. Uniform bounds over the entire parameter space are vacuous in this regime, and recent work has shown that non-vacuous guarantees can be recovered by restricting attention to the part of parameter space that the algorithm actually visits. This survey paper organizes this line of work around three steps: extending PAC-Bayesian theory to random, data-dependent hypothesis sets (arXiv:2404.17442); refining the complexity term with geometric and topological descriptors of the optimization trajectory, including fractal dimensions, alpha-weighted lifetime sums, and positive magnitude (arXiv:2006.09313, arXiv:2302.02766, arXiv:2407.08723); and replacing the resulting information-theoretic terms by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
