Position: Many generalization measures for deep learning are fragile
Shuofeng Zhang, Ard Louis

TL;DR
This paper highlights the fragility of many post-mortem generalization measures in deep learning, showing they are sensitive to minor training changes and may not reliably reflect true model generalization.
Contribution
The paper critically evaluates the robustness of existing generalization measures and emphasizes the need for developers to assess their fragility.
Findings
Many measures are sensitive to small hyperparameter changes.
PAC-Bayes origin measure is less sensitive but misses data complexity.
Function-based PAC-Bayes bound captures data complexity but is not post-mortem.
Abstract
In this position paper, we argue that many post-mortem generalization measures -- those computed on trained networks -- are \textbf{fragile}: small training modifications that barely affect the performance of the underlying deep neural network can substantially change a measure's value, trend, or scaling behavior. For example, minor hyperparameter changes, such as learning rate adjustments or switching between SGD variants, can reverse the slope of a learning curve in widely used generalization measures such as the path norm. We also identify subtler forms of fragility. For instance, the PAC-Bayes origin measure is regarded as one of the most reliable, and is indeed less sensitive to hyperparameter tweaks than many other measures. However, it completely fails to capture differences in data complexity across learning curves. This data fragility contrasts with the function-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
