Scalable and Efficient Multiple Imputation for Case-Cohort Studies via Influence Function-Based Supersampling
Jooho Kim, Yei Eun Shin

TL;DR
This paper introduces an influence function-based supersampling method for multiple imputation in large-scale case-cohort studies, significantly reducing computational costs while maintaining high efficiency in estimating hazard ratios.
Contribution
It proposes a novel supersampling approach with weight calibration that achieves near-full cohort imputation efficiency with less computational effort.
Findings
Method performs well in simulations, matching full cohort imputation.
Reduces computational time compared to existing supersampling methods.
Effective in high-dimensional biomarker analysis.
Abstract
Two-phase sampling designs have been widely adopted in epidemiological studies to reduce costs when measuring certain biomarkers is prohibitively expensive. Under these designs, investigators commonly relate survival outcomes to risk factors using the Cox proportional hazards model. To fully utilize covariates collected in phase 1, multiple imputation (MI) methods have been developed to impute missing covariates for individuals not included in the phase 2 sample. However, MI becomes computationally intensive in large-scale cohorts, particularly when rejection sampling is employed to mitigate bias arising from nonlinear or interaction terms in the analysis model. To address this issue, Borgan et al. (2023) proposed a random supersampling (RSS) approach that randomly selects a subset of cohort members for imputation, albeit at the cost of reduced efficiency. In this study, we propose an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Statistical Methods and Inference · Advanced Causal Inference Techniques
