Not How Many, But Which: Parameter Placement in Low-Rank Adaptation
Arijit Sehanobish, Charles Lovering

TL;DR
This paper investigates how the choice of parameter placement in low-rank adaptation affects fine-tuning performance, revealing that gradient-informed selection is crucial in certain regimes and proposing a fast scoring method to identify key parameters.
Contribution
It introduces a gradient-informed parameter placement strategy for LoRA adapters and a quick scoring method to identify critical parameters efficiently.
Findings
Random parameter placement performs comparably to informed placement under supervised fine-tuning.
Gradient-informed placement significantly improves performance in the GRPO regime.
Critical parameters are concentrated on residual-stream-writing projections, stable across models.
Abstract
We study the \textit{parameter placement problem}: given a fixed budget of trainable entries within the B matrix of a LoRA adapter (A frozen), does the choice of which matter? Under supervised fine-tuning, random and informed subsets achieve comparable performance. Under GRPO on base models, random placement fails to improve over the base model, while gradient-informed placement recovers standard LoRA accuracy. This regime dependence traces to gradient structure: SFT gradients are low-rank and directionally stable, so any subset accumulates coherent updates; GRPO gradients are high-rank and near-orthogonal across steps, so only elements with consistently signed gradients retain the learning signal. Our scoring procedure identifies these critical parameters in under 10 seconds at less than 0.5% of training cost. Selected parameters concentrate on residual-stream-writing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
