Optimal and Safe Estimation for High-Dimensional Semi-Supervised Learning
Siyi Deng, Yang Ning, Jiwei Zhao, Heping Zhang

TL;DR
This paper develops optimal and safe semi-supervised estimators for high-dimensional linear models, leveraging unlabeled data to improve estimation accuracy even under model misspecification.
Contribution
It introduces the first minimax lower bound for semi-supervised linear regression and proposes estimators that achieve this bound, enhancing existing supervised methods.
Findings
The optimal semi-supervised estimator attains the minimax lower bound.
The safe estimator is guaranteed to perform at least as well as supervised estimators.
Numerical simulations and real data analysis validate the theoretical advantages.
Abstract
We consider the estimation problem in high-dimensional semi-supervised learning. Our goal is to investigate when and how the unlabeled data can be exploited to improve the estimation of the regression parameters of linear model in light of the fact that such linear models may be misspecified in data analysis. We first establish the minimax lower bound for parameter estimation in the semi-supervised setting, and show that this lower bound cannot be achieved by supervised estimators using the labeled data only. We propose an optimal semi-supervised estimator that can attain this lower bound and therefore improves the supervised estimators, provided that the conditional mean function can be consistently estimated with a proper rate. We further propose a safe semi-supervised estimator. We view it safe, because this estimator is always at least as good as the supervised estimators. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Distributed Sensor Networks and Detection Algorithms · Statistical Methods and Inference
MethodsLinear Regression
