Inductive biases of multi-task learning and finetuning: multiple regimes of feature reuse
Samuel Lippl, Jack W. Lindsey

TL;DR
This paper investigates the implicit regularization biases in multi-task learning and finetuning, revealing feature reuse patterns, a novel nested feature selection regime, and their impact on neural network performance.
Contribution
It characterizes the inductive biases of MTL and PT+FT, introduces the nested feature selection regime, and demonstrates how weight rescaling can enhance finetuning in deep networks.
Findings
MTL and PT+FT favor feature reuse and sparsity
Nested feature selection is a distinct regime in PT+FT
Weight rescaling improves finetuning performance
Abstract
Neural networks are often trained on multiple tasks, either simultaneously (multi-task learning, MTL) or sequentially (pretraining and subsequent finetuning, PT+FT). In particular, it is common practice to pretrain neural networks on a large auxiliary task before finetuning on a downstream task with fewer samples. Despite the prevalence of this approach, the inductive biases that arise from learning multiple tasks are poorly characterized. In this work, we address this gap. We describe novel implicit regularization penalties associated with MTL and PT+FT in diagonal linear networks and single-hidden-layer ReLU networks. These penalties indicate that MTL and PT+FT induce the network to reuse features in different ways. 1) Both MTL and PT+FT exhibit biases towards feature reuse between tasks, and towards sparsity in the set of learned features. We show a "conservation law" that implies a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsFeature Selection
