Targeted learning via probabilistic subpopulation matching

Xiaokang Liu; Jie Hu; Naimin Jing; Yang Ning; Cheng Yong Tang; Runze Li; Yong Chen

arXiv:2512.21840·stat.ME·December 29, 2025

Targeted learning via probabilistic subpopulation matching

Xiaokang Liu, Jie Hu, Naimin Jing, Yang Ning, Cheng Yong Tang, Runze Li, Yong Chen

PDF

Open Access

TL;DR

This paper introduces a novel subpopulation matching framework for targeted learning in biomedical research, allowing effective information transfer across heterogeneous studies without sample loss.

Contribution

It proposes a two-step method using mixture models for probabilistic subpopulation identification and information transfer, improving prediction accuracy in heterogeneous data settings.

Findings

01

Method improves prediction accuracy in simulations

02

Effectively handles heterogeneity across studies

03

Utilizes all source data without sample exclusion

Abstract

In biomedical research, to obtain more accurate prediction results from a target study, leveraging information from multiple similar source studies is proved to be useful. However, in many biomedical applications based on real-world data, populations under consideration in different studies, e.g., clinical sites, can be heterogeneous, leading to challenges in properly borrowing information towards the target study. The state of art methods are typically based on study-level matching to identify source studies that are similar to the target study, whilst samples from source studies that significantly differ from the target study will all be dropped at the study level, which can lead to substantial loss of information. We consider a general situation where all studies are sampled from a super-population composed of distinct subpopulations, and propose a novel framework of targeted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Advanced Causal Inference Techniques