When pre-training hurts LoRA fine-tuning: a dynamical analysis via single-index models
Gibbs Nwemadji, Bruno Loureiro, Jean Barbier

TL;DR
This paper reveals that excessive pre-training can hinder LoRA fine-tuning by slowing convergence, even when pre-training and downstream tasks are aligned, due to prolonged search phases in the optimization process.
Contribution
It provides a mathematical analysis of how pre-training strength affects LoRA fine-tuning dynamics using single-index models and SGD, highlighting limitations of naive pre-training assumptions.
Findings
Strong pre-training can induce prolonged search phases.
Pre-training alignment does not guarantee faster fine-tuning.
Theoretical characterization of convergence dependence on pre-training and task complexity.
Abstract
Pre-training on a source task is usually expected to facilitate fine-tuning on similar downstream problems. In this work, we mathematically show that this naive intuition is not always true: excessive pre-training can computationally slow down fine-tuning optimization. We study this phenomenon for low-rank adaptation (LoRA) fine-tuning on single-index models trained under one-pass SGD. Leveraging a summary statistics description of the fine-tuning dynamics, we precisely characterize how the convergence rate depends on the initial fine-tuning alignment and the degree of non-linearity of the target task. The key take away is that even when the pre-training and down- stream tasks are well aligned, strong pre-training can induce a prolonged search phase and hinder convergence. Our theory thus provides a unified picture of how pre-training strength and task difficulty jointly shape the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Neural Networks and Reservoir Computing
