A calibration test for evaluating set-based epistemic uncertainty representations

Mira J\"urgens; Thomas Mortier; Eyke H\"ullermeier; Viktor Bengs; Willem Waegeman

arXiv:2502.16299·cs.LG·July 30, 2025

A calibration test for evaluating set-based epistemic uncertainty representations

Mira J\"urgens, Thomas Mortier, Eyke H\"ullermeier, Viktor Bengs, Willem Waegeman

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel statistical calibration test for set-based epistemic uncertainty representations, allowing instance-dependent convex combinations and leveraging proper scoring rules for improved calibration assessment.

Contribution

It proposes a new nonparametric calibration test that evaluates the validity of credal sets with instance-dependent combinations, enhancing uncertainty quantification methods.

Findings

01

The test effectively detects calibration issues in synthetic and real-world datasets.

02

Instance-dependent convex combinations improve calibration accuracy.

03

The method outperforms previous calibration assessment techniques.

Abstract

The accurate representation of epistemic uncertainty is a challenging yet essential task in machine learning. A widely used representation corresponds to convex sets of probabilistic predictors, also known as credal sets. One popular way of constructing these credal sets is via ensembling or specialized supervised learning methods, where the epistemic uncertainty can be quantified through measures such as the set size or the disagreement among members. In principle, these sets should contain the true data-generating distribution. As a necessary condition for this validity, we adopt the strongest notion of calibration as a proxy. Concretely, we propose a novel statistical test to determine whether there is a convex combination of the set's predictions that is calibrated in distribution. In contrast to previous methods, our framework allows the convex combination to be instance dependent,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mkjuergens/ensemblecalibration
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProbabilistic and Robust Engineering Design

MethodsSparse Evolutionary Training · ADaptive gradient method with the OPTimal convergence rate