How does unlabeled data improve generalization in self-training? A one-hidden-layer theoretical analysis
Shuai Zhang, Meng Wang, Sijia Liu, Pin-Yu Chen, Jinjun Xiong

TL;DR
This paper provides the first theoretical analysis of iterative self-training with one-hidden-layer neural networks, demonstrating how unlabeled data enhances convergence and generalization, supported by experiments across neural network depths.
Contribution
It establishes a theoretical framework for understanding self-training in shallow neural networks, highlighting the benefits of unlabeled data on convergence and generalization.
Findings
Self-training converges linearly with improved rate and accuracy.
Unlabeled data improves generalization by a factor of 1/√M.
Experimental results validate theoretical insights across neural network depths.
Abstract
Self-training, a semi-supervised learning algorithm, leverages a large amount of unlabeled data to improve learning when the labeled data are limited. Despite empirical successes, its theoretical characterization remains elusive. To the best of our knowledge, this work establishes the first theoretical analysis for the known iterative self-training paradigm and proves the benefits of unlabeled data in both training convergence and generalization ability. To make our theoretical analysis feasible, we focus on the case of one-hidden-layer neural networks. However, theoretical understanding of iterative self-training is non-trivial even for a shallow neural network. One of the key challenges is that existing neural network landscape analysis built upon supervised learning no longer holds in the (semi-supervised) self-training paradigm. We address this challenge and prove that iterative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
