A Theoretical Analysis of Fine-tuning with Linear Teachers

Gal Shachaf; Alon Brutzkus; Amir Globerson

arXiv:2107.01641·cs.LG·November 9, 2021·5 cites

A Theoretical Analysis of Fine-tuning with Linear Teachers

Gal Shachaf, Alon Brutzkus, Amir Globerson

PDF

Open Access 1 Video

TL;DR

This paper provides a theoretical analysis of fine-tuning in deep learning, focusing on how task similarity and data structure influence sample complexity reduction, with insights for linear and deep linear models.

Contribution

It introduces a measure of task similarity affecting sample complexity and analyzes its impact across linear, deep linear, and shallow ReLU models, supported by empirical validation.

Findings

01

Sample complexity reduction is possible when source and target tasks are similar.

02

Task similarity depends on the relation between source, target, and data covariance.

03

Deeper networks influence the similarity measure and sample efficiency.

Abstract

Fine-tuning is a common practice in deep learning, achieving excellent generalization results on downstream tasks using relatively little training data. Although widely used in practice, it is lacking strong theoretical understanding. We analyze the sample complexity of this scheme for regression with linear teachers in several architectures. Intuitively, the success of fine-tuning depends on the similarity between the source tasks and the target task, however measuring it is non trivial. We show that a relevant measure considers the relation between the source task, the target task and the covariance structure of the target data. In the setting of linear regression, we show that under realistic settings a substantial sample complexity reduction is plausible when the above measure is low. For deep linear regression, we present a novel result regarding the inductive bias of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Theoretical Analysis of Fine-tuning with Linear Teachers· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning