Three-phase generalized raking and multiple imputation estimators to address error-prone data
Gustavo Amorim, Ran Tao, Sarah Lotspeich, Pamela A. Shaw, Thomas, Lumley, Rena C. Patel, Bryan E. Shepherd

TL;DR
This paper introduces advanced estimators that leverage all available validation data, including intermediate steps, to improve accuracy and efficiency in analyzing error-prone data, demonstrated through a large HIV study.
Contribution
It proposes two novel extensions of multiple imputation and generalized raking estimators that utilize intermediate validation data for improved estimation efficiency.
Findings
Incorporating intermediate validation steps enhances estimator efficiency.
Simulation results show substantial gains in estimation accuracy.
Application to HIV data illustrates practical benefits.
Abstract
Validation studies are often used to obtain more reliable information in settings with error-prone data. Validated data on a subsample of subjects can be used together with error-prone data on all subjects to improve estimation. In practice, more than one round of data validation may be required, and direct application of standard approaches for combining validation data into analyses may lead to inefficient estimators since the information available from intermediate validation steps is only partially considered or even completely ignored. In this paper, we present two novel extensions of multiple imputation and generalized raking estimators that make full use of all available data. We show through simulations that incorporating information from intermediate steps can lead to substantial gains in efficiency. This work is motivated by and illustrated in a study of contraceptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Statistical Methods and Inference · Survey Methodology and Nonresponse
