Universality in Transfer Learning for Linear Models
Reza Ghane, Danil Akhtiamov, Babak Hassibi

TL;DR
This paper provides a rigorous analysis of transfer learning in linear models, showing that generalization and classification errors depend only on first and second order statistics, regardless of distribution specifics.
Contribution
It introduces universal formulas for transfer learning errors in linear models that depend solely on basic distribution statistics, extending beyond Gaussian assumptions.
Findings
Fine-tuned models can outperform pretrained ones under certain conditions.
Universal error formulas depend only on first and second order statistics.
Results apply to both SGD-trained models and ridge regression classifiers.
Abstract
We study the problem of transfer learning and fine-tuning in linear models for both regression and binary classification. In particular, we consider the use of stochastic gradient descent (SGD) on a linear model initialized with pretrained weights and using a small training data set from the target distribution. In the asymptotic regime of large models, we provide an exact and rigorous analysis and relate the generalization errors (in regression) and classification errors (in binary classification) for the pretrained and fine-tuned models. In particular, we give conditions under which the fine-tuned model outperforms the pretrained one. An important aspect of our work is that all the results are "universal", in the sense that they depend only on the first and second order statistics of the target distribution. They thus extend well beyond the standard Gaussian assumptions commonly made…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
MethodsSparse Evolutionary Training
