Measuring Stochastic Data Complexity with Boltzmann Influence Functions
Nathan Ng, Roger Grosse, Marzyeh Ghassemi

TL;DR
This paper introduces IF-COMP, a scalable approximation of the pNML distribution using Boltzmann influence functions, improving uncertainty estimation and complexity measurement in machine learning models.
Contribution
It proposes a novel, efficient method for uncertainty quantification and complexity measurement based on influence functions and pNML, applicable to both labeled and unlabeled data.
Findings
IF-COMP achieves well-calibrated predictions.
It outperforms baseline methods in uncertainty calibration.
Effective in mislabel and out-of-distribution detection.
Abstract
Estimating the uncertainty of a model's prediction on a test point is a crucial part of ensuring reliability and calibration under distribution shifts. A minimum description length approach to this problem uses the predictive normalized maximum likelihood (pNML) distribution, which considers every possible label for a data point, and decreases confidence in a prediction if other labels are also consistent with the model and training data. In this work we propose IF-COMP, a scalable and efficient approximation of the pNML distribution that linearizes the model with a temperature-scaled Boltzmann influence function. IF-COMP can be used to produce well-calibrated predictions on test points as well as measure complexity in both labelled and unlabelled settings. We experimentally validate IF-COMP on uncertainty calibration, mislabel detection, and OOD detection tasks, where it consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Neural Networks and Applications · Machine Learning in Materials Science
