Why Self-Training Helps and Hurts: Denoising vs. Signal Forgetting
Mingqi Wu, Archer Y. Yang, Qiang Sun

TL;DR
This paper analyzes the effects of self-training in overparameterized linear regression, revealing a trade-off between signal forgetting and denoising that impacts test risk and optimal stopping, with theoretical and empirical validation.
Contribution
It provides a theoretical framework for understanding self-training dynamics in linear models, including risk recursions, spectral filtering effects, and a data-driven stopping criterion.
Findings
Test risk exhibits a U-shaped curve over iterations.
Iteration acts as a spectral filter preserving strong eigendirections.
Proposed cross-validation method accurately estimates optimal stopping time.
Abstract
Iterative self-training (self-distillation) repeatedly refits a model on pseudo-labels generated by its own predictions. We study this procedure in overparameterized linear regression: an initial estimator is trained on noisy labels, and each subsequent iterate is trained on fresh covariates with noiseless pseudo-labels from the previous model. In the high-dimensional regime, we derive deterministic-equivalent recursions for the prediction risk and effective noise across iterations, and prove that the empirical quantities concentrate sharply around these limits. The recursion separates two competing forces: a systematic component that grows with iteration due to progressive signal forgetting, and a stochastic component that decays due to denoising via repeated data-dependent projections. Their interaction yields a -shaped test-risk curve and an optimal early-stopping time. In spiked…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Face and Expression Recognition · Image and Signal Denoising Methods
