Targeted learning via probabilistic subpopulation matching
Xiaokang Liu, Jie Hu, Naimin Jing, Yang Ning, Cheng Yong Tang, Runze Li, Yong Chen

TL;DR
This paper introduces a novel subpopulation matching framework for targeted learning in biomedical research, allowing effective information transfer across heterogeneous studies without sample loss.
Contribution
It proposes a two-step method using mixture models for probabilistic subpopulation identification and information transfer, improving prediction accuracy in heterogeneous data settings.
Findings
Method improves prediction accuracy in simulations
Effectively handles heterogeneity across studies
Utilizes all source data without sample exclusion
Abstract
In biomedical research, to obtain more accurate prediction results from a target study, leveraging information from multiple similar source studies is proved to be useful. However, in many biomedical applications based on real-world data, populations under consideration in different studies, e.g., clinical sites, can be heterogeneous, leading to challenges in properly borrowing information towards the target study. The state of art methods are typically based on study-level matching to identify source studies that are similar to the target study, whilst samples from source studies that significantly differ from the target study will all be dropped at the study level, which can lead to substantial loss of information. We consider a general situation where all studies are sampled from a super-population composed of distinct subpopulations, and propose a novel framework of targeted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Advanced Causal Inference Techniques
