Designed Sampling from Large Databases for Controlled Trials
Liwen Ouyang, Daniel W. Apley, Sanjay Mehrotra

TL;DR
This paper introduces a novel method using optimal design of experiments to select patient samples from large databases for controlled trials, improving estimation accuracy without bias.
Contribution
It develops a new approach leveraging DOE principles for optimal sample selection from existing data, enhancing trial efficiency and accuracy.
Findings
Outperforms random sampling in simulation tests
Provides unbiased estimates of treatment effects
Offers simple guidelines and an optimization algorithm
Abstract
The increasing prevalence of rich sources of data and the availability of electronic medical record databases and electronic registries opens tremendous opportunities for enhancing medical research. For example, controlled trials are ubiquitously used to investigate the effect of a medical treatment, perhaps dependent on a set of patient covariates, and traditional approaches have relied primarily on randomized patient sampling and allocation to treatment and control group. However, when covariate data for a large cohort group of patients have already been collected and are available in a database, one can potentially design a treatment/control sample and allocation that provides far better estimates of the covariate-dependent effects of the treatment. In this paper, we develop a new approach that uses optimal design of experiments (DOE) concepts to accomplish this objective. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Advanced Causal Inference Techniques · Health Systems, Economic Evaluations, Quality of Life
