When to Trust Confidence Thresholding: Calibration Diagnostics for Pseudo-Labelled Regression
Marcell T. Kurbucz

TL;DR
This paper develops a calibration-aware diagnostic tool for pseudo-labeling in regression, predicting bias from residual score variance to guide safe confidence thresholding.
Contribution
It introduces a novel operational decision rule based on residual variance and calibration drift, enabling practitioners to assess the safety of confidence thresholding.
Findings
Bias can be predicted from residual score variance before inference.
A sharp sensitivity bound is derived under bounded calibration drift.
The decision rule is validated through simulations and a real dataset.
Abstract
Calibrated probability outputs of trained classifiers are increasingly used as inputs to downstream regression estimands such as effects, prevalences, or disparities for a latent group observed only on a small labelled subset. A standard practice is to threshold the calibrated score at a confidence cutoff and treat the hard label as the truth. Building on a recent identification result for the underlying moment equation, we develop a calibration-aware diagnostic apparatus for pseudo-labelling pipelines. We derive a closed-form expression for the attenuation bias that confidence thresholding induces in the downstream regression coefficient, and show that the bias can be predicted, before any inference is run, from the residual score variance on the unlabelled set after partialling out the downstream controls . We further obtain a sharp…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
