When pre-training hurts LoRA fine-tuning: a dynamical analysis via single-index models

Gibbs Nwemadji; Bruno Loureiro; Jean Barbier

arXiv:2602.02855·cs.LG·February 4, 2026

When pre-training hurts LoRA fine-tuning: a dynamical analysis via single-index models

Gibbs Nwemadji, Bruno Loureiro, Jean Barbier

PDF

Open Access

TL;DR

This paper reveals that excessive pre-training can hinder LoRA fine-tuning by slowing convergence, even when pre-training and downstream tasks are aligned, due to prolonged search phases in the optimization process.

Contribution

It provides a mathematical analysis of how pre-training strength affects LoRA fine-tuning dynamics using single-index models and SGD, highlighting limitations of naive pre-training assumptions.

Findings

01

Strong pre-training can induce prolonged search phases.

02

Pre-training alignment does not guarantee faster fine-tuning.

03

Theoretical characterization of convergence dependence on pre-training and task complexity.

Abstract

Pre-training on a source task is usually expected to facilitate fine-tuning on similar downstream problems. In this work, we mathematically show that this naive intuition is not always true: excessive pre-training can computationally slow down fine-tuning optimization. We study this phenomenon for low-rank adaptation (LoRA) fine-tuning on single-index models trained under one-pass SGD. Leveraging a summary statistics description of the fine-tuning dynamics, we precisely characterize how the convergence rate depends on the initial fine-tuning alignment and the degree of non-linearity of the target task. The key take away is that even when the pre-training and down- stream tasks are well aligned, strong pre-training can induce a prolonged search phase and hinder convergence. Our theory thus provides a unified picture of how pre-training strength and task difficulty jointly shape the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Neural Networks and Reservoir Computing