Augmented transfer regression learning for completely missing covariates
Huali Zhao, Tianying Wang

TL;DR
This paper introduces an augmented transfer regression learning method to address the challenge of completely missing covariates in large-scale population datasets, leveraging invariance assumptions and doubly robust estimation.
Contribution
It proposes a novel transfer learning approach that combines importance weighting and imputation, achieving consistency and efficiency under covariate invariance assumptions.
Findings
Estimator is doubly robust and consistent under model misspecification.
Method attains the semiparametric efficiency bound with correct models.
Estimator is $n^{1/2}$-consistent and asymptotically normal.
Abstract
Large-scale population-level datasets, such as the UK Biobank and the All of Us Research Program, often lack covariates needed for a specific analysis, such as genetic or lifestyle measures, while related studies measure them. This creates a cross-population missing data problem in which covariates are completely unobserved in the target population, rather than partially missing within one dataset. We propose an augmented transfer regression learning method for this setting. The key identifying condition is a sub-population shift assumption: the joint distribution of the outcome and observed covariates may differ across source and target populations, but the conditional distribution of the missing covariates given observed variables is invariant. We combine importance-weighted estimating equations with imputation terms for first- and second-order moments of the missing covariates. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
