Linearization Explains Fine-Tuning in Large Language Models
Zahra Rahimi Afzal, Tara Esmaeilbeig, Mojtaba Soltanalian, Mesrob I. Ohannessian

TL;DR
This paper explores how linearization explains the effectiveness of parameter-efficient fine-tuning in large language models, linking NTK spectral properties to adaptation performance and providing theoretical and empirical insights.
Contribution
It introduces a linearization perspective to understand PEFT, analyzes the NTK spectrum's role in fine-tuning, and validates findings with experiments on LoRA and LLMs.
Findings
Strong correlation between NTK eigenvalues and model performance.
Spectral perturbation bounds inform layer selection for fine-tuning.
Linearization approximates fine-tuning dynamics under regularization.
Abstract
Parameter-Efficient Fine-Tuning (PEFT) is a popular class of techniques that strive to adapt large models in a scalable and resource-efficient manner. Yet, the mechanisms underlying their training performance and generalization remain underexplored. In this paper, we provide several insights into such fine-tuning through the lens of linearization. Fine-tuned models are often implicitly encouraged to remain close to the pretrained model. By making this explicit, using an Euclidean distance inductive bias in parameter space, we show that fine-tuning dynamics become equivalent to learning with the positive-definite neural tangent kernel (NTK). We specifically analyze how close the fully linear and the linearized fine-tuning optimizations are, based on the strength of the regularization. This allows us to be pragmatic about how good a model linearization is when fine-tuning large language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis
