Why the pseudo label based semi-supervised learning algorithm is effective?
Zeping Min, Qian Ge, Cheng Tai

TL;DR
This paper provides a theoretical analysis explaining why pseudo label-based semi-supervised learning is effective, showing it can achieve comparable generalization error bounds to fully supervised models with large unlabeled data.
Contribution
The paper offers a theoretical framework demonstrating the conditions under which pseudo label semi-supervised learning approaches optimal generalization error bounds.
Findings
As the amount of unlabeled data increases, the model's generalization error approaches that of fully supervised learning.
When enough unlabeled data is available, the error bound converges linearly to the optimal bound.
The analysis establishes a lower bound on sampling complexity for achieving linear convergence.
Abstract
Recently, pseudo label based semi-supervised learning has achieved great success in many fields. The core idea of the pseudo label based semi-supervised learning algorithm is to use the model trained on the labeled data to generate pseudo labels on the unlabeled data, and then train a model to fit the previously generated pseudo labels. We give a theory analysis for why pseudo label based semi-supervised learning is effective in this paper. We mainly compare the generalization error of the model trained under two settings: (1) There are N labeled data. (2) There are N unlabeled data and a suitable initial model. Our analysis shows that, firstly, when the amount of unlabeled data tends to infinity, the pseudo label based semi-supervised learning algorithm can obtain model which have the same generalization error upper bound as model obtained by normally training in the condition of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Technology and Assessment · Machine Learning and ELM · Advanced Computing and Algorithms
