Source-Optimal Training is Transfer-Suboptimal

C. Evans Hedges

arXiv:2511.08401·stat.ML·January 7, 2026

Source-Optimal Training is Transfer-Suboptimal

C. Evans Hedges

PDF

Open Access

TL;DR

This paper demonstrates that training a source model optimally for its own task often leads to suboptimal transfer performance, with the optimal regularization depending on task alignment and transfer conditions.

Contribution

We analytically characterize the mismatch between source-optimal and transfer-optimal regularization in ridge regression and validate findings with experiments on synthetic and real datasets.

Findings

01

Transfer benefits depend on task alignment and source regularization strength.

02

Source-optimal training is generally suboptimal for transfer learning.

03

The transfer-optimal regularization can be predicted by task alignment measures.

Abstract

We prove that training a source model optimally for its own task is generically suboptimal when the objective is downstream transfer. We study the source-side optimization problem in L2-SP ridge regression and show a fundamental mismatch between the source-optimal and transfer-optimal source regularization: outside of a measure-zero set, $τ_{0}^{*} \neq = τ_{S}^{*}$ . We characterize the transfer-optimal source penalty $τ_{0}^{*}$ as a function of task alignment and identify an alignment-dependent reversal: with imperfect alignment ( $0 < ρ < 1$ ), transfer benefits from stronger source regularization, while in super-aligned regimes ( $ρ > 1$ ), transfer benefits from weaker regularization. Additionally, in isotropic settings, the decision of whether transfer helps is independent of the target sample size and noise, depending only on task alignment and source characteristics. We verify the linear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Speech and Audio Processing