Source data selection for out-of-domain generalization

Xinran Miao; Kris Sankaran

arXiv:2202.02155·cs.LG·February 7, 2022

Source data selection for out-of-domain generalization

Xinran Miao, Kris Sankaran

PDF

Open Access

TL;DR

This paper investigates methods for selecting source data to improve out-of-domain generalization, proposing two approaches based on multi-bandit theory and random search, with empirical validation on simulated and real datasets.

Contribution

It introduces two novel source data selection methods tailored for out-of-domain transfer learning, addressing negative transfer issues.

Findings

01

Proposed methods outperform random selection in experiments.

02

Source selection diagnostics can identify better reweighted source subsamples.

03

Empirical results validate effectiveness on diverse datasets.

Abstract

Models that perform out-of-domain generalization borrow knowledge from heterogeneous source data and apply it to a related but distinct target task. Transfer learning has proven effective for accomplishing this generalization in many applications. However, poor selection of a source dataset can lead to poor performance on the target, a phenomenon called negative transfer. In order to take full advantage of available source data, this work studies source data selection with respect to a target task. We propose two source selection methods that are based on the multi-bandit theory and random search, respectively. We conduct a thorough empirical evaluation on both simulated and real data. Our proposals can be also viewed as diagnostics for the existence of a reweighted source subsamples that perform better than the random selection of available samples.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Machine Learning and Algorithms