Learning Non-Vacuous Generalization Bounds from Optimization

Chengli Tan; Jiangshe Zhang; Junmin Liu

arXiv:2206.04359·cs.LG·July 23, 2024

Learning Non-Vacuous Generalization Bounds from Optimization

Chengli Tan, Jiangshe Zhang, Junmin Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new approach to derive non-vacuous, tight generalization bounds for deep neural networks by modeling the training process with stochastic differential equations, providing plausible guarantees for large-scale models.

Contribution

It presents a novel method leveraging fractal-like hypothesis sets and continuous-time stochastic modeling to obtain meaningful generalization bounds for modern neural networks.

Findings

01

Provides plausible generalization guarantees for ResNet and Vision Transformer.

02

Achieves tighter bounds over algorithm-dependent Rademacher complexity.

03

Demonstrates effectiveness on large-scale datasets like ImageNet-1K.

Abstract

One of the fundamental challenges in the deep learning community is to theoretically understand how well a deep neural network generalizes to unseen data. However, current approaches often yield generalization bounds that are either too loose to be informative of the true generalization error or only valid to the compressed nets. In this study, we present a simple yet non-vacuous generalization bound from the optimization perspective. We achieve this goal by leveraging that the hypothesis set accessed by stochastic gradient algorithms is essentially fractal-like and thus can derive a tighter bound over the algorithm-dependent Rademacher complexity. The main argument rests on modeling the discrete-time recursion process via a continuous-time stochastic differential equation driven by fractional Brownian motion. Numerical studies demonstrate that our approach is able to yield plausible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hirsch-lab/cyminiball
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Model Reduction and Neural Networks

MethodsStochastic Gradient Descent