A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning

Nicolas Anguita; Francesco Locatello; Andrew M. Saxe; Marco Mondelli; Flavia Mancini; Samuel Lippl; Clementine Domine

arXiv:2602.20062·cs.LG·February 24, 2026

A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning

Nicolas Anguita, Francesco Locatello, Andrew M. Saxe, Marco Mondelli, Flavia Mancini, Samuel Lippl, Clementine Domine

PDF

Open Access

TL;DR

This paper develops an analytical theory for how pretraining initialization choices influence feature reuse and refinement during fine-tuning, affecting generalization performance in neural networks.

Contribution

It introduces a theoretical framework analyzing the impact of initialization scales on feature learning and generalization in fine-tuning, supported by empirical validation.

Findings

01

Smaller initialization scales enable feature reuse and refinement.

02

Four distinct fine-tuning regimes are identified based on initialization.

03

Initialization parameters significantly influence generalization in nonlinear networks.

Abstract

Pretraining and fine-tuning are central stages in modern machine learning systems. In practice, feature learning plays an important role across both stages: deep neural networks learn a broad range of useful features during pretraining and further refine those features during fine-tuning. However, an end-to-end theoretical understanding of how choices of initialization impact the ability to reuse and refine features during fine-tuning has remained elusive. Here we develop an analytical theory of the pretraining-fine-tuning pipeline in diagonal linear networks, deriving exact expressions for the generalization error as a function of initialization parameters and task statistics. We find that different initialization choices place the network into four distinct fine-tuning regimes that are distinguished by their ability to support feature learning and reuse, and therefore by the task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing · Stochastic Gradient Optimization Techniques · Explainable Artificial Intelligence (XAI)