A Note on "Assessing Generalization of SGD via Disagreement"
Andreas Kirsch, Yarin Gal

TL;DR
This paper critically examines the theory linking model disagreement to generalization error in deep neural networks, revealing limitations in the original calibration assumptions and providing simplified, probabilistic proofs.
Contribution
It challenges the practicality of the disagreement-equality theory by showing calibration deterioration and offers simplified theoretical analysis within a probabilistic framework.
Findings
Calibration can worsen as disagreement increases.
The original theory may not hold in practical scenarios.
Simplified proofs clarify the theoretical basis.
Abstract
Several recent works find empirically that the average test error of deep neural networks can be estimated via the prediction disagreement of models, which does not require labels. In particular, Jiang et al. (2022) show for the disagreement between two separately trained networks that this `Generalization Disagreement Equality' follows from the well-calibrated nature of deep ensembles under the notion of a proposed `class-aggregated calibration.' In this reproduction, we show that the suggested theory might be impractical because a deep ensemble's calibration can deteriorate as prediction disagreement increases, which is precisely when the coupling of test error and disagreement is of interest, while labels are needed to estimate the calibration on new datasets. Further, we simplify the theoretical statements and proofs, showing them to be straightforward within a probabilistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsDeep Ensembles
