Transfer Learning of Linear Regression with Multiple Pretrained Models: Benefiting from More Pretrained Models via Overparameterization Debiasing
Daniel Boharon, Yehuda Dar

TL;DR
This paper investigates how overparameterized pretrained models can be effectively used in transfer learning for linear regression, proposing a debiasing method to improve performance when leveraging multiple models.
Contribution
It introduces an analytical framework for transfer learning with multiple overparameterized pretrained models and proposes a simple debiasing technique to mitigate overparameterization bias.
Findings
Using more overparameterized pretrained models can improve transfer learning.
Overparameterization bias can hinder learning, but can be reduced with a multiplicative correction.
Debiasing enables leveraging more pretrained models for better target predictor performance.
Abstract
We study transfer learning for a linear regression task using several least-squares pretrained models that can be overparameterized. We formulate the target learning task as optimization that minimizes squared errors on the target dataset with penalty on the distance of the learned model from the pretrained models. We analytically formulate the test error of the learned target model and provide the corresponding empirical evaluations. Our results elucidate when using more pretrained models can improve transfer learning. Specifically, if the pretrained models are overparameterized, using sufficiently many of them is important for beneficial transfer learning. However, the learning may be compromised by overparameterization bias of pretrained models, i.e., the minimum -norm solution's restriction to a small subspace spanned by the training examples in the high-dimensional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Face and Expression Recognition
