Semi-supervised Regression Analysis with Model Misspecification and High-dimensional Data
Ye Tian, Peng Wu, Zhiqiang Tan

TL;DR
This paper introduces a robust semi-supervised regression framework that handles model misspecification and high-dimensional data, providing consistent and reliable inference in both SSL and CSTL settings.
Contribution
It develops an augmented inverse probability weighted (AIPW) method with regularized estimators, unifying previous approaches and ensuring valid inference under misspecification and high-dimensionality.
Findings
The proposed estimator is consistent and asymptotically normal when the propensity score model is correct.
The method maintains valid confidence intervals despite outcome regression misspecification.
Simulation studies and real data confirm the effectiveness of the approach.
Abstract
The accessibility of vast volumes of unlabeled data has sparked growing interest in semi-supervised learning (SSL) and covariate shift transfer learning (CSTL). In this paper, we present an inference framework for estimating regression coefficients in conditional mean models within both SSL and CSTL settings, while allowing for the misspecification of conditional mean models. We develop an augmented inverse probability weighted (AIPW) method, employing regularized calibrated estimators for both propensity score (PS) and outcome regression (OR) nuisance models, with PS and OR models being sequentially dependent. We show that when the PS model is correctly specified, the proposed estimator achieves consistency, asymptotic normality, and valid confidence intervals, even with possible OR model misspecification and high-dimensional data. Moreover, by suppressing detailed technical choices,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models
