When to Trust Confidence Thresholding: Calibration Diagnostics for Pseudo-Labelled Regression

Marcell T. Kurbucz

arXiv:2605.12780·stat.ME·May 14, 2026

When to Trust Confidence Thresholding: Calibration Diagnostics for Pseudo-Labelled Regression

Marcell T. Kurbucz

PDF

TL;DR

This paper develops a calibration-aware diagnostic tool for pseudo-labeling in regression, predicting bias from residual score variance to guide safe confidence thresholding.

Contribution

It introduces a novel operational decision rule based on residual variance and calibration drift, enabling practitioners to assess the safety of confidence thresholding.

Findings

01

Bias can be predicted from residual score variance before inference.

02

A sharp sensitivity bound is derived under bounded calibration drift.

03

The decision rule is validated through simulations and a real dataset.

Abstract

Calibrated probability outputs of trained classifiers are increasingly used as inputs to downstream regression estimands such as effects, prevalences, or disparities for a latent group observed only on a small labelled subset. A standard practice is to threshold the calibrated score at a confidence cutoff and treat the hard label as the truth. Building on a recent identification result for the underlying moment equation, we develop a calibration-aware diagnostic apparatus for pseudo-labelling pipelines. We derive a closed-form expression for the attenuation bias that confidence thresholding induces in the downstream regression coefficient, and show that the bias can be predicted, before any inference is run, from the residual score variance $V^{*} = E [Var (p ∣ X)]$ on the unlabelled set after partialling out the downstream controls $X$ . We further obtain a sharp…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.