USE: Uncertainty Structure Estimation for Robust Semi-Supervised Learning
Tsao-Lun Chen, Chien-Liang Liu, Tzu-Ming Harry Hsu, Tai-Hsien Wu, Chi-Cheng Fu, Han-Yi E. Chou, Shun-Feng Su

TL;DR
This paper introduces USE, a lightweight method for assessing unlabeled data quality in semi-supervised learning, improving robustness and accuracy by filtering out uninformative or harmful samples, especially under out-of-distribution contamination.
Contribution
USE provides a principled, model-agnostic approach to evaluate and curate unlabeled data quality, enhancing SSL performance in contaminated environments.
Findings
USE improves accuracy on CIFAR-100 and Yelp Review datasets.
USE enhances robustness against out-of-distribution contamination.
USE effectively filters uninformative unlabeled samples before training.
Abstract
In this study, a novel idea, Uncertainty Structure Estimation (USE), a lightweight, algorithm-agnostic procedure that emphasizes the often-overlooked role of unlabeled data quality is introduced for Semi-supervised learning (SSL). SSL has achieved impressive progress, but its reliability in deployment is limited by the quality of the unlabeled pool. In practice, unlabeled data are almost always contaminated by out-of-distribution (OOD) samples, where both near-OOD and far-OOD can negatively affect performance in different ways. We argue that the bottleneck does not lie in algorithmic design, but rather in the absence of principled mechanisms to assess and curate the quality of unlabeled data. The proposed USE trains a proxy model on the labeled set to compute entropy scores for unlabeled samples, and then derives a threshold, via statistical comparison against a reference distribution,…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The proposed USE method is simple and easy to implement. It functions as a plug-in stage prior to downstream SSL training, and therefore can be adopted by any SSL methods. 2. The experiments showed that on diverse datasets across CV and NLP domains, the proposed USE method showed good results.
May major concern is about the experiments. Since this is a data pre-processing method, to show its effectiveness, it should be compared with other pre-processing methods for SSL training. But such experiments are lacked. Therefore, the effectiveness of the method is in doubt. In addition, the SSL methods tested in experiments are old. The latest one is FlexMatch, published in 2021. It's unclear if the proposed data pre-processing method could also improve the performance of the state-of-the-a
1.The paper's primary strength is its conceptual shift in the field of semi-supervised learning (SSL). It moves the focus away from creating increasingly complex algorithms and toward the more fundamental and practical problem of unlabeled data quality, which is a critical bottleneck in real-world applications. 2.The effectiveness is demonstrated through extensive experiments showing that it consistently improves both the accuracy and robustness of a wide range of SSL algorithms.
1.It seems that the proposed method depends on the Proxy Model. This paper improves both the accuracy and robustness of a wide range of SSL algorithms, but it hinges on the quality of the "proxy model". 2.The "first downward crossing point" rule is a heuristic method. Its stability is heavily affected by the choice of the KDE bandwidth. The rule lacks a theoretical basis to prove its optimality or robustness under various possible data distributions. 3.More complex datasets are required for val
* The writing is well-organized: smooth and easy to follow. * The motivation of the method design is intuitive and reasonable. * The experimental settings are cross-domain (CV/NLP) with detailed implementations. The results appear impressive.
* Lacks comparison to related fields: No discussion of open-world SSL (e.g., ORCA [R1]) or Generalized Category Discovery (GCD) [R2], which assume the unlabeled pool may contain both seen-class samples and novel-class samples. These works show that novel-class samples can boost seen-class performance. This work does not consider this realistic setting, and I highly suggest the authors compare with this line of work. * Narrow empirical scope: Results are reported only on CIFAR-100 and Yelp-250.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Anomaly Detection Techniques and Applications
