TL;DR
This survey comprehensively reviews self-training methods in semi-supervised learning, highlighting their principles, variants, and applications across classification tasks, and discusses future research directions.
Contribution
It is the first thorough survey of self-training, covering methods, variants, related approaches, and their performance on various benchmarks.
Findings
Self-training effectively leverages unlabeled data for classification.
Different variants impact performance depending on data and task.
Future research directions include improving confidence measures and combining with other semi-supervised techniques.
Abstract
Semi-supervised algorithms aim to learn prediction functions from a small set of labeled observations and a large set of unlabeled observations. Because this framework is relevant in many applications, they have received a lot of interest in both academia and industry. Among the existing techniques, self-training methods have undoubtedly attracted greater attention in recent years. These models are designed to find the decision boundary on low density regions without making additional assumptions about the data distribution, and use the unsigned output score of a learned classifier, or its margin, as an indicator of confidence. The working principle of self-training algorithms is to learn a classifier iteratively by assigning pseudo-labels to the set of unlabeled training samples with a margin greater than a certain threshold. The pseudo-labeled examples are then used to enrich the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
