Leveraging Ensemble Diversity for Robust Self-Training in the Presence of Sample Selection Bias
Ambroise Odonnat, Vasilii Feofanov, Ievgen Redko

TL;DR
This paper introduces a new confidence measure based on ensemble diversity, called T-similarity, to improve self-training robustness under sample selection bias in semi-supervised learning.
Contribution
It proposes the T-similarity confidence measure leveraging ensemble diversity, with theoretical analysis and empirical validation across multiple datasets and pseudo-labeling policies.
Findings
T-similarity improves pseudo-labeling accuracy in biased data scenarios.
Theoretical analysis links diversity to classifier performance.
Empirical results show enhanced robustness in semi-supervised learning.
Abstract
Self-training is a well-known approach for semi-supervised learning. It consists of iteratively assigning pseudo-labels to unlabeled data for which the model is confident and treating them as labeled examples. For neural networks, softmax prediction probabilities are often used as a confidence measure, although they are known to be overconfident, even for wrong predictions. This phenomenon is particularly intensified in the presence of sample selection bias, i.e., when data labeling is subject to some constraint. To address this issue, we propose a novel confidence measure, called -similarity, built upon the prediction diversity of an ensemble of linear classifiers. We provide the theoretical analysis of our approach by studying stationary points and describing the relationship between the diversity of the individual members and their performance. We empirically demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning
MethodsSoftmax
