A Theoretical Analysis of Fine-tuning with Linear Teachers
Gal Shachaf, Alon Brutzkus, Amir Globerson

TL;DR
This paper provides a theoretical analysis of fine-tuning in deep learning, focusing on how task similarity and data structure influence sample complexity reduction, with insights for linear and deep linear models.
Contribution
It introduces a measure of task similarity affecting sample complexity and analyzes its impact across linear, deep linear, and shallow ReLU models, supported by empirical validation.
Findings
Sample complexity reduction is possible when source and target tasks are similar.
Task similarity depends on the relation between source, target, and data covariance.
Deeper networks influence the similarity measure and sample efficiency.
Abstract
Fine-tuning is a common practice in deep learning, achieving excellent generalization results on downstream tasks using relatively little training data. Although widely used in practice, it is lacking strong theoretical understanding. We analyze the sample complexity of this scheme for regression with linear teachers in several architectures. Intuitively, the success of fine-tuning depends on the similarity between the source tasks and the target task, however measuring it is non trivial. We show that a relevant measure considers the relation between the source task, the target task and the covariance structure of the target data. In the setting of linear regression, we show that under realistic settings a substantial sample complexity reduction is plausible when the above measure is low. For deep linear regression, we present a novel result regarding the inductive bias of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning
