Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications
T. Tony Cai, Zijian Guo

TL;DR
This paper develops a semi-supervised inference method for explained variance in high-dimensional linear regression, leveraging both labeled and unlabeled data to improve estimation accuracy and confidence interval length.
Contribution
It introduces a calibrated estimator that achieves minimax optimal convergence rates and extends to quadratic functional inference, demonstrating practical benefits in various high-dimensional applications.
Findings
Estimator achieves minimax optimal rate
Unlabelled data reduces confidence interval length
Method improves inference in high-dimensional problems
Abstract
This paper considers statistical inference for the explained variance under the high-dimensional linear model in the semi-supervised setting, where is the regression vector and is the design covariance matrix. A calibrated estimator, which efficiently integrates both labelled and unlabelled data, is proposed. It is shown that the estimator achieves the minimax optimal rate of convergence in the general semi-supervised framework. The optimality result characterizes how the unlabelled data contributes to the estimation accuracy. Moreover, the limiting distribution for the proposed estimator is established and the unlabelled data has also proven useful in reducing the length of the confidence interval for the explained variance. The proposed method is extended to the semi-supervised inference for the unweighted quadratic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
