Optimal Multi-Wave Validation of Secondary Use Data with Outcome and Exposure Misclassification
Sarah C. Lotspeich, Gustavo G. C. Amorim, Pamela A. Shaw, Ran, Tao, Bryan E. Shepherd

TL;DR
This paper develops an optimal multi-wave validation design for secondary use data with outcome and exposure misclassification, improving efficiency in estimating odds ratios in observational studies.
Contribution
It introduces a novel adaptive grid search algorithm and multi-wave sampling strategy to optimize validation design under parameter uncertainty.
Findings
Optimal designs significantly reduce variance compared to existing methods.
The multi-wave approach effectively approximates the ideal design in practice.
Simulation and real data demonstrate substantial efficiency gains.
Abstract
The growing availability of observational databases like electronic health records (EHR) provides unprecedented opportunities for secondary use of such data in biomedical research. However, these data can be error-prone and need to be validated before use. It is usually unrealistic to validate the whole database due to resource constraints. A cost-effective alternative is to implement a two-phase design that validates a subset of patient records that are enriched for information about the research question of interest. Herein, we consider odds ratio estimation under differential outcome and exposure misclassification. We propose optimal designs that minimize the variance of the maximum likelihood odds ratio estimator. We develop a novel adaptive grid search algorithm that can locate the optimal design in a computationally feasible and numerically accurate manner. Because the optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Statistical Methods and Inference · Advanced Causal Inference Techniques
