Beyond Pooling: Matching for Robust Generalization under Data Heterogeneity

Ayush Roy; Rudrasis Chakraborty; Lav Varshney; Vishnu Suresh Lokhande

arXiv:2602.07154·cs.LG·February 10, 2026

Beyond Pooling: Matching for Robust Generalization under Data Heterogeneity

Ayush Roy, Rudrasis Chakraborty, Lav Varshney, Vishnu Suresh Lokhande

PDF

Open Access

TL;DR

This paper introduces a matching framework that improves the robustness of representation learning across heterogeneous datasets, especially in zero-shot medical anomaly detection, by filtering confounding domains and refining data distribution.

Contribution

It proposes a novel matching method with adaptive centroid refinement and propensity score matching, outperforming naive pooling in handling data heterogeneity.

Findings

01

Matching outperforms naive pooling in asymmetric settings

02

The method improves zero-shot medical anomaly detection

03

Theoretical analysis confirms robustness under diverse distributions

Abstract

Pooling heterogeneous datasets across domains is a common strategy in representation learning, but naive pooling can amplify distributional asymmetries and yield biased estimators, especially in settings where zero-shot generalization is required. We propose a matching framework that selects samples relative to an adaptive centroid and iteratively refines the representation distribution. The double robustness and the propensity score matching for the inclusion of data domains make matching more robust than naive pooling and uniform subsampling by filtering out the confounding domains (the main cause of heterogeneity). Theoretical and empirical analyses show that, unlike naive pooling or uniform subsampling, matching achieves better results under asymmetric meta-distributions, which are also extended to non-Gaussian and multimodal real-world settings. Most importantly, we show that these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning · Imbalanced Data Classification Techniques