Robust Data Fusion via Subsampling

Jing Wang; HaiYing Wang; Kun Chen

arXiv:2508.12048·stat.ML·August 19, 2025

Robust Data Fusion via Subsampling

Jing Wang, HaiYing Wang, Kun Chen

PDF

Open Access

TL;DR

This paper introduces robust transfer learning methods that utilize subsampling strategies to effectively handle outliers in external data, improving model accuracy when target data is limited.

Contribution

It develops novel subsampling techniques for transfer learning under data contamination, providing theoretical error bounds and demonstrating superior empirical performance.

Findings

01

Subsampling strategies reduce bias and variance in transfer learning.

02

Theoretical error bounds clarify factors affecting estimator performance.

03

Robust methods improve estimation of rare events, exemplified by airplane risk analysis.

Abstract

Data fusion and transfer learning are rapidly growing fields that enhance model performance for a target population by leveraging other related data sources or tasks. The challenges lie in the various potential heterogeneities between the target and external data, as well as various practical concerns that prevent a na\"ive data integration. We consider a realistic scenario where the target data is limited in size while the external data is large but contaminated with outliers; such data contamination, along with other computational and operational constraints, necessitates proper selection or subsampling of the external data for transfer learning. To our knowledge,transfer learning and subsampling under data contamination have not been thoroughly investigated. We address this gap by studying various transfer learning methods with subsamples of the external data, accounting for outliers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Advanced Statistical Process Monitoring · Statistical Methods and Inference