Measuring Stochastic Data Complexity with Boltzmann Influence Functions

Nathan Ng; Roger Grosse; Marzyeh Ghassemi

arXiv:2406.02745·cs.LG·July 22, 2024

Measuring Stochastic Data Complexity with Boltzmann Influence Functions

Nathan Ng, Roger Grosse, Marzyeh Ghassemi

PDF

Open Access

TL;DR

This paper introduces IF-COMP, a scalable approximation of the pNML distribution using Boltzmann influence functions, improving uncertainty estimation and complexity measurement in machine learning models.

Contribution

It proposes a novel, efficient method for uncertainty quantification and complexity measurement based on influence functions and pNML, applicable to both labeled and unlabeled data.

Findings

01

IF-COMP achieves well-calibrated predictions.

02

It outperforms baseline methods in uncertainty calibration.

03

Effective in mislabel and out-of-distribution detection.

Abstract

Estimating the uncertainty of a model's prediction on a test point is a crucial part of ensuring reliability and calibration under distribution shifts. A minimum description length approach to this problem uses the predictive normalized maximum likelihood (pNML) distribution, which considers every possible label for a data point, and decreases confidence in a prediction if other labels are also consistent with the model and training data. In this work we propose IF-COMP, a scalable and efficient approximation of the pNML distribution that linearizes the model with a temperature-scaled Boltzmann influence function. IF-COMP can be used to produce well-calibrated predictions on test points as well as measure complexity in both labelled and unlabelled settings. We experimentally validate IF-COMP on uncertainty calibration, mislabel detection, and OOD detection tasks, where it consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Neural Networks and Applications · Machine Learning in Materials Science