Learning Label Refinement and Threshold Adjustment for Imbalanced   Semi-Supervised Learning

Zeju Li; Ying-Qiu Zheng; Chen Chen; Saad Jbabdi

arXiv:2407.05370·cs.LG·September 18, 2024

Learning Label Refinement and Threshold Adjustment for Imbalanced Semi-Supervised Learning

Zeju Li, Ying-Qiu Zheng, Chen Chen, Saad Jbabdi

PDF

Open Access 1 Repo

TL;DR

This paper introduces SEVAL, a novel pseudo-label refinement and threshold adjustment method for imbalanced semi-supervised learning, improving pseudo-label quality and outperforming existing SSL techniques.

Contribution

The paper proposes SEVAL, a class-balanced pseudo-label optimization approach that enhances pseudo-label accuracy and robustness in imbalanced SSL scenarios.

Findings

01

SEVAL outperforms state-of-the-art SSL methods in imbalanced settings.

02

SEVAL improves pseudo-label accuracy and class-wise correctness.

03

The method is simple, flexible, and applicable to various SSL techniques.

Abstract

Semi-supervised learning (SSL) algorithms struggle to perform well when exposed to imbalanced training data. In this scenario, the generated pseudo-labels can exhibit a bias towards the majority class, and models that employ these pseudo-labels can further amplify this bias. Here we investigate pseudo-labeling strategies for imbalanced SSL including pseudo-label refinement and threshold adjustment, through the lens of statistical analysis. We find that existing SSL algorithms which generate pseudo-labels using heuristic strategies or uncalibrated model confidence are unreliable when imbalanced class distributions bias pseudo-labels. To address this, we introduce SEmi-supervised learning with pseudo-label optimization based on VALidation data (SEVAL) to enhance the quality of pseudo-labelling for imbalanced SSL. We propose to learn refinement and thresholding parameters from a partition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zerojumpline/seval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques