TL;DR
This paper introduces a residual feature integration method that provably prevents negative transfer in transfer learning, supported by theoretical guarantees and extensive empirical validation across diverse tasks.
Contribution
It provides the first theoretical framework ensuring protection against negative transfer using residual feature integration, applicable across multiple data modalities.
Findings
The method guarantees no worse convergence than training from scratch.
It transitions from nonparametric to near-parametric convergence with informative source representations.
Empirically safeguards performance across various benchmarks and distribution shifts.
Abstract
Transfer learning has become a central paradigm in modern machine learning, yet it suffers from the long-standing problem of negative transfer, where leveraging source representations can harm rather than help performance on the target task. Although empirical remedies have been proposed, there remains little theoretical understanding of how to reliably avoid negative transfer. In this paper, we investigate a simple yet remarkably effective strategy: augmenting frozen, pretrained source-side features with a trainable target-side encoder that adapts target features to capture residual signals overlooked by models pretrained on the source data. We show this residual feature integration strategy is sufficient to provably prevent negative transfer, by establishing theoretical guarantees that it has no worse convergence rate than training from scratch under the informative class of target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
