Semi-supervised linear regression: enhancing efficiency and robustness in high dimensions

Kai Chen; Yuqian Zhang

arXiv:2311.17685·stat.ME·September 3, 2025·1 cites

Semi-supervised linear regression: enhancing efficiency and robustness in high dimensions

Kai Chen, Yuqian Zhang

PDF

Open Access

TL;DR

This paper introduces robust semi-supervised linear regression methods that leverage unlabeled data to improve estimation accuracy and robustness in high-dimensional settings, even under model misspecification.

Contribution

It presents new semi-supervised estimators that do not rely on sparsity and demonstrate improved efficiency and robustness in high-dimensional linear regression.

Findings

01

Unlabeled data reduces estimation bias in high dimensions.

02

Proposed methods outperform existing approaches in simulations.

03

Enhanced efficiency in sparse linear models.

Abstract

In semi-supervised learning, the prevailing understanding suggests that observing additional unlabeled samples improves estimation accuracy for linear parameters only in the case of model misspecification. In this work, we challenge such a claim and show that additional unlabeled samples are beneficial in high-dimensional settings. Initially focusing on a dense scenario, we introduce robust semi-supervised estimators for the regression coefficient without relying on sparse structures in the population slope. Even when the true underlying model is linear, we show that leveraging information from large-scale unlabeled data helps reduce estimation bias, thereby improving both estimation accuracy and inference robustness. Moreover, we propose semi-supervised methods with further enhanced efficiency in scenarios with a sparse linear slope. The performance of the proposed methods is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Advanced Statistical Process Monitoring