Leveraging Ensemble Diversity for Robust Self-Training in the Presence   of Sample Selection Bias

Ambroise Odonnat; Vasilii Feofanov; Ievgen Redko

arXiv:2310.14814·cs.LG·April 4, 2024·1 cites

Leveraging Ensemble Diversity for Robust Self-Training in the Presence of Sample Selection Bias

Ambroise Odonnat, Vasilii Feofanov, Ievgen Redko

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new confidence measure based on ensemble diversity, called T-similarity, to improve self-training robustness under sample selection bias in semi-supervised learning.

Contribution

It proposes the T-similarity confidence measure leveraging ensemble diversity, with theoretical analysis and empirical validation across multiple datasets and pseudo-labeling policies.

Findings

01

T-similarity improves pseudo-labeling accuracy in biased data scenarios.

02

Theoretical analysis links diversity to classifier performance.

03

Empirical results show enhanced robustness in semi-supervised learning.

Abstract

Self-training is a well-known approach for semi-supervised learning. It consists of iteratively assigning pseudo-labels to unlabeled data for which the model is confident and treating them as labeled examples. For neural networks, softmax prediction probabilities are often used as a confidence measure, although they are known to be overconfident, even for wrong predictions. This phenomenon is particularly intensified in the presence of sample selection bias, i.e., when data labeling is subject to some constraint. To address this issue, we propose a novel confidence measure, called $T$ -similarity, built upon the prediction diversity of an ensemble of linear classifiers. We provide the theoretical analysis of our approach by studying stationary points and describing the relationship between the diversity of the individual members and their performance. We empirically demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ambroiseodt/tsim
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning

MethodsSoftmax