Variance Estimation Using Refitted Cross-validation in Ultrahigh Dimensional Regression
Jianqing Fan, Shaojun Guo, Ning Hao

TL;DR
This paper introduces a refitted cross-validation method for accurate variance estimation in ultrahigh dimensional linear regression, effectively reducing bias caused by irrelevant variables and high spurious correlations.
Contribution
It proposes a novel two-stage refitted procedure using data splitting to improve variance estimation in ultrahigh dimensional settings, matching oracle performance.
Findings
The RCV method achieves asymptotic performance comparable to the oracle estimator.
Simulation results demonstrate the effectiveness of RCV over naive methods.
The approach improves variance estimation accuracy in ultrahigh dimensional models.
Abstract
Variance estimation is a fundamental problem in statistical modeling. In ultrahigh dimensional linear regressions where the dimensionality is much larger than sample size, traditional variance estimation techniques are not applicable. Recent advances on variable selection in ultrahigh dimensional linear regressions make this problem accessible. One of the major problems in ultrahigh dimensional regression is the high spurious correlation between the unobserved realized noise and some of the predictors. As a result, the realized noises are actually predicted when extra irrelevant variables are selected, leading to serious underestimate of the noise level. In this paper, we propose a two-stage refitted procedure via a data splitting technique, called refitted cross-validation (RCV), to attenuate the influence of irrelevant variables with high spurious correlations. Our asymptotic results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Control Systems and Identification
