On Pseudo-Labeling for Class-Mismatch Semi-Supervised Learning
Lu Han, Han-Jia Ye, De-Chuan Zhan

TL;DR
This paper analyzes the impact of out-of-distribution data on pseudo-labeling in class-mismatched semi-supervised learning and proposes methods to improve performance by re-balancing labels and exploring semantic clusters.
Contribution
It identifies the imbalance issue caused by OOD data in pseudo-labeling and introduces RPL and SEC to mitigate this problem and enhance SSL performance.
Findings
RPL filters out OOD data and balances pseudo-labels.
SEC creates pseudo-labels for extra classes via clustering.
Proposed methods outperform baselines across benchmarks.
Abstract
When there are unlabeled Out-Of-Distribution (OOD) data from other classes, Semi-Supervised Learning (SSL) methods suffer from severe performance degradation and even get worse than merely training on labeled data. In this paper, we empirically analyze Pseudo-Labeling (PL) in class-mismatched SSL. PL is a simple and representative SSL method that transforms SSL problems into supervised learning by creating pseudo-labels for unlabeled data according to the model's prediction. We aim to answer two main questions: (1) How do OOD data influence PL? (2) What is the proper usage of OOD data with PL? First, we show that the major problem of PL is imbalanced pseudo-labels on OOD data. Second, we find that OOD data can help classify In-Distribution (ID) data given their OOD ground truth labels. Based on the findings, we propose to improve PL in class-mismatched SSL with two components --…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Text and Document Classification Technologies · Machine Learning and Data Classification
