Exploration and Exploitation of Unlabeled Data for Open-Set Semi-Supervised Learning
Ganlong Zhao, Guanbin Li, Yipeng Qin, Jinjin Zhang, Zhenhua Chai,, Xiaolin Wei, Liang Lin, Yizhou Yu

TL;DR
This paper introduces a novel open-set semi-supervised learning approach that leverages both in-distribution and out-of-distribution unlabeled data through prototype clustering and importance sampling, improving learning performance.
Contribution
It proposes a prototype-based clustering algorithm and an importance sampling method to effectively utilize both ID and OOD samples in open-set SSL, surpassing previous methods.
Findings
Achieves state-of-the-art results on several benchmarks.
Improves SSL performance even without ID samples in unlabeled data.
Enhances feature learning through prototype-based clustering.
Abstract
In this paper, we address a complex but practical scenario in semi-supervised learning (SSL) named open-set SSL, where unlabeled data contain both in-distribution (ID) and out-of-distribution (OOD) samples. Unlike previous methods that only consider ID samples to be useful and aim to filter out OOD ones completely during training, we argue that the exploration and exploitation of both ID and OOD samples can benefit SSL. To support our claim, i) we propose a prototype-based clustering and identification algorithm that explores the inherent similarity and difference among samples at feature level and effectively cluster them around several predefined ID and OOD prototypes, thereby enhancing feature learning and facilitating ID/OOD identification; ii) we propose an importance-based sampling method that exploits the difference in importance of each ID and OOD sample to SSL, thereby reducing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Anomaly Detection Techniques and Applications
