Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers
Liam Hodgkinson, Umut \c{S}im\c{s}ekli, Rajiv Khanna, Michael W., Mahoney

TL;DR
This paper establishes new theoretical bounds linking the generalization performance of stochastic optimizers to the lower tail exponents of their transition kernels, supported by empirical neural network results.
Contribution
It introduces the first rigorous bounds connecting generalization to lower tail exponents in discrete and continuous stochastic optimization, using a novel Fernique-Talagrand functional approach.
Findings
Lower tail exponents correlate with generalization error in neural networks.
Derived bounds apply to both discrete and continuous stochastic optimizers.
Empirical results support the theoretical relationship between tail behavior and generalization.
Abstract
Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood. While recent work has revealed connections between generalization and heavy-tailed behavior in stochastic optimization, this work mainly relied on continuous-time approximations; and a rigorous treatment for the original discrete-time iterations is yet to be performed. To bridge this gap, we present novel bounds linking generalization to the lower tail exponent of the transition kernel associated with the optimizer around a local minimum, in both discrete- and continuous-time settings. To achieve this, we first prove a data- and algorithm-dependent generalization bound in terms of the celebrated Fernique-Talagrand functional applied to the trajectory of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Sparse and Compressive Sensing Techniques
