PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models
Soufiane Hayou, Nikhil Ghosh, Bin Yu

TL;DR
PLoP introduces a lightweight, theoretically grounded method for automatically identifying optimal adapter placement in large models, improving finetuning efficiency and effectiveness across tasks.
Contribution
The paper proposes PLoP, a novel method for automatic adapter placement in LoRA finetuning, addressing the lack of conclusive strategies for module selection.
Findings
PLoP outperforms common placement strategies in experiments.
PLoP is effective in supervised finetuning and reinforcement learning tasks.
Theoretical analysis supports PLoP's automatic placement approach.
Abstract
Low-Rank Adaptation (LoRA) is a widely used finetuning method for large models. Its small memory footprint allows practitioners to adapt large models to specific tasks at a fraction of the cost of full finetuning. Different modifications have been proposed to enhance its efficiency by, for example, setting the learning rate, the rank, and the initialization. Another improvement axis is adapter placement strategy: when using LoRA, practitioners usually pick module types to adapt with LoRA, such as Query and Key modules. Few works have studied the problem of adapter placement, with nonconclusive results: original LoRA paper suggested placing adapters in attention modules, while other works suggested placing them in the MLP modules. Through an intuitive theoretical analysis, we introduce PLoP (Precise LoRA Placement), a lightweight method that allows automatic identification of module…
Peer Reviews
Decision·ICLR 2026 Poster
Authors outline the importance of adapter placement, and show a computationally tractable and architecture-agnostic way of doing so. Theoretical justification of NFN scores is compelling. Experiments shed light on the variance of MFN scores between layers, modules, and architectures (with and without reasoning), indicating that there is a need for a method that takes that variance into account. The proposed framework and contributions are novel to my knowledge. Parameter-efficient finetun
In Text: - Figure 1 needs more clarification on inputs and outputs, baseline feature norm. - In section 3, first motivate and introduce random baseline feature norm, then introduce NFN formulation to improve clarity. - The second and third paragraph in the Introduction can be tied together and streamlined. They also cover almost exactly the same content as related works. - Please elaborate more on the baseline presented in section 3/figure 4 and 5 - Authors need to improve formatting in Appendi
1) The proposed method exhibits strong generalization capabilities. It presents a unified training framework that can be generalized to any large model training approach. By simply providing the model and dataset, computing the NFN metrics, and then freezing modules based on these metrics, the method can be applied universally. 2) The theoretical analysis is comprehensive. The design motivation of NFN is supported by the theoretical basis of "Feature Norm Growth in Linear Networks", making it t
1) The most unacceptable issue with this work is that the experimental results are not significant. In the main experiments, there is no substantial difference between PLOP and fine-tuning other modules, which may indicate that researching which module is more worthy of fine-tuning has little value. 2) As a method for saving computational resources, what I expect from PLoP is its efficiency in the efficient fine-tuning of large-parameter models. However, the experimental part only uses small mo
1. The paper reframes adapter placement as a task-aware alignment problem and proposes a simple, gradient-free proxy (normalized feature norm) to rank module types. this is a clean definition that removes scale effects via a randomized baseline, making it broadly applicable across architectures and tasks. the theory–metric–procedure chain is coherent and distinct from gradient/sensitivity-based selection, which is heavier and less LoRA-friendly. 2. Theoretical pieces motivate why feature norms
1. The finetuning tasks are confined to anli and math (metamathqa to gsm8k) for sft and grpo; there are no non-math generative, instruction-following, multilingual, or long-context evaluations. while fig. 4–5 report nfn patterns for code/history/logic, there are no finetuning experiments on those domains, so generality remains untested. 2. NFN is computed from a single forward pass using hooks, but the paper does not report estimator variance, sensitivity to batch/sequence length, or across-bat
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Model-Driven Software Engineering Techniques · Natural Language Processing Techniques
MethodsAdapter
