Semi-Supervised Empirical Risk Minimization: Using unlabeled data to improve prediction
Oren Yuval, Saharon Rosset

TL;DR
This paper introduces an adaptive semi-supervised learning method for generalized linear regression that leverages unlabeled data to improve prediction accuracy, outperforming traditional supervised methods and null models in various settings.
Contribution
The paper proposes a novel adaptive SSL approach that uses unlabeled data to identify when SSL can outperform supervised learning and null models, with theoretical and empirical validation.
Findings
Adaptive SSL can significantly outperform supervised and null models in various scenarios.
Non-adaptive SSL does not improve over supervised or null models beyond negligible terms.
Empirical results confirm the effectiveness of the adaptive SSL approach across different data distributions.
Abstract
We present a general methodology for using unlabeled data to design semi supervised learning (SSL) variants of the Empirical Risk Minimization (ERM) learning process. Focusing on generalized linear regression, we analyze of the effectiveness of our SSL approach in improving prediction performance. The key ideas are carefully considering the null model as a competitor, and utilizing the unlabeled data to determine signal-noise combinations where SSL outperforms both supervised learning and the null model. We then use SSL in an adaptive manner based on estimation of the signal and noise. In the special case of linear regression with Gaussian covariates, we prove that the non-adaptive SSL version is in fact not capable of improving on both the supervised estimator and the null model simultaneously, beyond a negligible O(1/n) term. On the other hand, the adaptive model presented in this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Advanced Causal Inference Techniques
MethodsLinear Regression
