Theoretical characterization of uncertainty in high-dimensional linear classification
Lucas Clart\'e, Bruno Loureiro, Florent Krzakala, Lenka Zdeborov\'a

TL;DR
This paper provides a theoretical analysis of uncertainty in high-dimensional linear classification, deriving formulas for Bayesian uncertainty and classifier calibration, especially in limited data scenarios, using approximate message passing.
Contribution
It introduces a closed-form formula linking Bayesian uncertainty, classifier predictions, and ground-truth uncertainty in high-dimensional Gaussian data, advancing understanding of model calibration.
Findings
Bayesian uncertainty can be approximated via AMP in high dimensions.
The derived formulas enable analysis of classifier calibration and over-confidence.
Regularization can mitigate over-confidence in limited data settings.
Abstract
Being able to reliably assess not only the \emph{accuracy} but also the \emph{uncertainty} of models' predictions is an important endeavour in modern machine learning. Even if the model generating the data and labels is known, computing the intrinsic uncertainty after learning the model from a limited number of samples amounts to sampling the corresponding posterior probability measure. Such sampling is computationally challenging in high-dimensional problems and theoretical results on heuristic uncertainty estimators in high-dimensions are thus scarce. In this manuscript, we characterise uncertainty for learning from limited number of samples of high-dimensional Gaussian input data and labels generated by the probit model. In this setting, the Bayesian uncertainty (i.e. the posterior marginals) can be asymptotically obtained by the approximate message passing algorithm, bypassing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning and Algorithms · Machine Learning and Data Classification
