A Confidence-Variance Theory for Pseudo-Label Selection in Semi-Supervised Learning
Jinshi Liu, Pan Liu, Lei He

TL;DR
This paper introduces a Confidence-Variance (CoVar) framework for pseudo-label selection in semi-supervised learning, combining confidence and residual-class variance to improve reliability over traditional fixed-threshold methods.
Contribution
It develops a novel reliability measure based on confidence and variance, and integrates it into semi-supervised methods, outperforming fixed-threshold strategies across multiple datasets.
Findings
Consistent improvement over strong baselines on multiple datasets.
A threshold-free pseudo-label selection mechanism based on confidence-variance.
Enhanced reliability in pseudo-label selection leading to better semi-supervised learning performance.
Abstract
Most pseudo-label selection strategies in semi-supervised learning rely on fixed confidence thresholds, implicitly assuming that prediction confidence reliably indicates correctness. In practice, deep networks are often overconfident: high-confidence predictions can still be wrong, while informative low-confidence samples near decision boundaries are discarded. This paper introduces a Confidence-Variance (CoVar) theory framework that provides a principled joint reliability criterion for pseudo-label selection. Starting from the entropy minimization principle, we derive a reliability measure that combines maximum confidence (MC) with residual-class variance (RCV), which characterizes how probability mass is distributed over non-maximum classes. The derivation shows that reliable pseudo-labels should have both high MC and low RCV, and that the influence of RCV increases as confidence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Text and Document Classification Technologies · Advanced Neural Network Applications
