A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning
Nicolas Anguita, Francesco Locatello, Andrew M. Saxe, Marco Mondelli, Flavia Mancini, Samuel Lippl, Clementine Domine

TL;DR
This paper develops an analytical theory for how pretraining initialization choices influence feature reuse and refinement during fine-tuning, affecting generalization performance in neural networks.
Contribution
It introduces a theoretical framework analyzing the impact of initialization scales on feature learning and generalization in fine-tuning, supported by empirical validation.
Findings
Smaller initialization scales enable feature reuse and refinement.
Four distinct fine-tuning regimes are identified based on initialization.
Initialization parameters significantly influence generalization in nonlinear networks.
Abstract
Pretraining and fine-tuning are central stages in modern machine learning systems. In practice, feature learning plays an important role across both stages: deep neural networks learn a broad range of useful features during pretraining and further refine those features during fine-tuning. However, an end-to-end theoretical understanding of how choices of initialization impact the ability to reuse and refine features during fine-tuning has remained elusive. Here we develop an analytical theory of the pretraining-fine-tuning pipeline in diagonal linear networks, deriving exact expressions for the generalization error as a function of initialization parameters and task statistics. We find that different initialization choices place the network into four distinct fine-tuning regimes that are distinguished by their ability to support feature learning and reuse, and therefore by the task…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Stochastic Gradient Optimization Techniques · Explainable Artificial Intelligence (XAI)
