Semi-supervised linear regression: enhancing efficiency and robustness in high dimensions
Kai Chen, Yuqian Zhang

TL;DR
This paper introduces robust semi-supervised linear regression methods that leverage unlabeled data to improve estimation accuracy and robustness in high-dimensional settings, even under model misspecification.
Contribution
It presents new semi-supervised estimators that do not rely on sparsity and demonstrate improved efficiency and robustness in high-dimensional linear regression.
Findings
Unlabeled data reduces estimation bias in high dimensions.
Proposed methods outperform existing approaches in simulations.
Enhanced efficiency in sparse linear models.
Abstract
In semi-supervised learning, the prevailing understanding suggests that observing additional unlabeled samples improves estimation accuracy for linear parameters only in the case of model misspecification. In this work, we challenge such a claim and show that additional unlabeled samples are beneficial in high-dimensional settings. Initially focusing on a dense scenario, we introduce robust semi-supervised estimators for the regression coefficient without relying on sparse structures in the population slope. Even when the true underlying model is linear, we show that leveraging information from large-scale unlabeled data helps reduce estimation bias, thereby improving both estimation accuracy and inference robustness. Moreover, we propose semi-supervised methods with further enhanced efficiency in scenarios with a sparse linear slope. The performance of the proposed methods is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Advanced Statistical Process Monitoring
