Robust Data Fusion via Subsampling
Jing Wang, HaiYing Wang, Kun Chen

TL;DR
This paper introduces robust transfer learning methods that utilize subsampling strategies to effectively handle outliers in external data, improving model accuracy when target data is limited.
Contribution
It develops novel subsampling techniques for transfer learning under data contamination, providing theoretical error bounds and demonstrating superior empirical performance.
Findings
Subsampling strategies reduce bias and variance in transfer learning.
Theoretical error bounds clarify factors affecting estimator performance.
Robust methods improve estimation of rare events, exemplified by airplane risk analysis.
Abstract
Data fusion and transfer learning are rapidly growing fields that enhance model performance for a target population by leveraging other related data sources or tasks. The challenges lie in the various potential heterogeneities between the target and external data, as well as various practical concerns that prevent a na\"ive data integration. We consider a realistic scenario where the target data is limited in size while the external data is large but contaminated with outliers; such data contamination, along with other computational and operational constraints, necessitates proper selection or subsampling of the external data for transfer learning. To our knowledge,transfer learning and subsampling under data contamination have not been thoroughly investigated. We address this gap by studying various transfer learning methods with subsamples of the external data, accounting for outliers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Advanced Statistical Process Monitoring · Statistical Methods and Inference
