Demographic Distribution Matching between real world and virtual phantom population
Dhrubajyoti Ghosh, Fakrul Islam Tushar, Lavsen Dahal, Liesbeth Vancoillie, Kyle J. Lafata, Ehsan Samei, Joseph Y. Lo, Sheng Luo

TL;DR
This paper presents DISTINCT, a statistical framework for selecting demographically aligned subsamples from large clinical datasets to improve the comparability of virtual imaging trials with real-world populations, enhancing their translational impact.
Contribution
The study introduces DISTINCT, a novel method for demographic matching in virtual trials, demonstrated on lung screening data to optimize sample size and subgroup analysis.
Findings
Identified a maximal demographic-aligned NLST subsample of 9,974 participants.
Demonstrated stabilized ROC AUC estimates beyond 6,000 participants.
Showed significant performance variation across demographic subgroups.
Abstract
Virtual imaging trials (VITs) offer scalable and cost-effective tools for evaluating imaging systems and protocols. However, their translational impact depends on rigorous comparability between virtual and real-world populations. This study introduces DISTINCT (Distributional Subsampling for Covariate-Targeted Alignment), a statistical framework for selecting demographically aligned subsamples from large clinical datasets to support robust comparisons with virtual cohorts. We applied DISTINCT to the National Lung Screening Trial (NLST) and a companion virtual trial dataset (VLST). The algorithm jointly aligned typical continuous (age, BMI) and categorical (sex, race, ethnicity) variables by constructing multidimensional bins based on discretized covariates. For a given target size, DISTINCT samples individuals to match the joint demographic distribution of the reference population. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Radiography and Breast Imaging · AI in cancer detection
