Stability, Complexity and Data-Dependent Worst-Case Generalization Bounds
Mario Tuci, Lennart Bastian, Benjamin Dupuis, Nassir Navab, Tolga Birdal, Umut \c{S}im\c{s}ekli

TL;DR
This paper introduces a new framework combining data-dependent complexity measures and stability concepts to derive practical, tight worst-case generalization bounds for stochastic optimization algorithms, improving upon previous intractable approaches.
Contribution
The authors develop the concept of random set stability and integrate it with empirical complexity measures to obtain computable, data-dependent generalization bounds that surpass prior mutual information-based methods.
Findings
Bounds are tighter and more practical to compute.
The framework recovers and improves existing topological bounds.
Experimental results validate the theoretical predictions.
Abstract
Providing generalization guarantees for stochastic optimization algorithms remains a key challenge in learning theory. Recently, numerous works demonstrated the impact of the geometric properties of optimization trajectories on generalization performance. These works propose worst-case generalization bounds in terms of various notions of intrinsic dimension and/or topological complexity, which were found to empirically correlate with the generalization error. However, most of these approaches involve intractable mutual information terms, which limit a full understanding of the bounds. In contrast, some authors built on algorithmic stability to obtain worst-case bounds involving geometric quantities of a combinatorial nature, which are impractical to compute. In this paper, we address these limitations by combining empirically relevant complexity measures with a framework that avoids…
Peer Reviews
Decision·Submitted to ICLR 2026
1. This paper introduces the novel concept of random set stability, creatively integrates algorithmic stability with a data-dependent random set framework, successfully avoids intractable mutual information terms in existing methods, and offers a fresh approach to deriving generalization bounds. 2. Through rigorous mathematical deductions, this paper completes theorem proofs and the recovery of classical bounds, while conducting systematic experiments on real datasets using ViT and GraphSage mod
1. This paper only provides expected generalization bounds instead of high-probability bounds, which limits the reliability of the framework in practical scenarios where probabilistic guarantees with strict confidence levels are needed. 2. Estimating the stability parameter requires replacing part of the training samples and retraining, leading to high computational costs in large-sample scenarios without proposing efficient optimization schemes.
* Novel stability framework that unifies point-wise points and uniform convergence bounds (by interpolating $J$) * Computable components
* Bound has convergence rate of $O(n^{−1/3})$ rather than the usual $O(n^{−1/2})$ * The empirical validation relies on a seemingly very optimistic estimate of the $\beta_n$ parameter * Empirical results seem to suggest a bound > 1 on a 0-1 loss
This paper studies an important and fundamental problem in generalization theory.
The main weakness of the paper is the writing: it is very dense and the authors don't spend time in giving intuitions etc. Lets consider introduction. The summary of contribution involves discussion on beta_n without precisely defining it. Another example is Definition 3.1, what is G_S'(w)? the definitions need to be self-contained! Lemma 3.2 talks about trajectory-stability. IT hasn't defined clearly. How one can understand the statements? In this version of the paper, it is very difficult to
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Constraint Satisfaction and Optimization · Advanced Graph Neural Networks
