On Pseudo-Labeling for Class-Mismatch Semi-Supervised Learning

Lu Han; Han-Jia Ye; De-Chuan Zhan

arXiv:2301.06010·cs.LG·January 18, 2023·6 cites

On Pseudo-Labeling for Class-Mismatch Semi-Supervised Learning

Lu Han, Han-Jia Ye, De-Chuan Zhan

PDF

Open Access

TL;DR

This paper analyzes the impact of out-of-distribution data on pseudo-labeling in class-mismatched semi-supervised learning and proposes methods to improve performance by re-balancing labels and exploring semantic clusters.

Contribution

It identifies the imbalance issue caused by OOD data in pseudo-labeling and introduces RPL and SEC to mitigate this problem and enhance SSL performance.

Findings

01

RPL filters out OOD data and balances pseudo-labels.

02

SEC creates pseudo-labels for extra classes via clustering.

03

Proposed methods outperform baselines across benchmarks.

Abstract

When there are unlabeled Out-Of-Distribution (OOD) data from other classes, Semi-Supervised Learning (SSL) methods suffer from severe performance degradation and even get worse than merely training on labeled data. In this paper, we empirically analyze Pseudo-Labeling (PL) in class-mismatched SSL. PL is a simple and representative SSL method that transforms SSL problems into supervised learning by creating pseudo-labels for unlabeled data according to the model's prediction. We aim to answer two main questions: (1) How do OOD data influence PL? (2) What is the proper usage of OOD data with PL? First, we show that the major problem of PL is imbalanced pseudo-labels on OOD data. Second, we find that OOD data can help classify In-Distribution (ID) data given their OOD ground truth labels. Based on the findings, we propose to improve PL in class-mismatched SSL with two components --…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Text and Document Classification Technologies · Machine Learning and Data Classification