ID-LoRA: Efficient Low-Rank Adaptation Inspired by Matrix Interpolative Decomposition
Xindian Ma, Rundong Kong, Peng Zhang, Ruoxiang Huang, Yongyu Jiang

TL;DR
ID-LoRA introduces a novel parameter-efficient fine-tuning method that extracts and reuses clustered parameter groups from pretrained weights, significantly reducing trainable parameters while maintaining performance across diverse tasks.
Contribution
It proposes a new PEFT framework that leverages clustered parameter groups to form multiple low-rank components sharing a single trainable matrix, improving efficiency and effectiveness.
Findings
Outperforms full fine-tuning and existing PEFT methods on five benchmarks.
Uses up to 46% fewer trainable parameters than standard LoRA.
Surpasses LoRA and variants in multi-task scenarios with only 54% of parameters.
Abstract
LoRA has become a universal Parameter-Efficient Fine-Tuning (PEFT) technique that equips Large Language Models (LLMs) to adapt quickly to new tasks. However, when these models are scaled up, even the latest LoRA variants still introduce considerable overhead in trainable parameters. Conversely, aggressively lowering the rank to curb this overhead markedly degrades performance in complex multi-task settings. We propose ID-LoRA, a novel PEFT framework that breaks the trade-off. Its core innovation lies in extracting and reusing clustered parameter groups from the pretrained weight matrix. These groups are then used to form multiple low-rank components, all of which share only a single initialized trainable low-rank matrix. This approach cuts the number of trainable parameters while keeping the model's capacity intact. We evaluate ID-LoRA on five diverse benchmarks: Mathematical Reasoning,…
Peer Reviews
Decision·Submitted to ICLR 2026
- The rank-vs-parameter trade-off is a well-known limitation of LoRA. - The method demonstrates clear and consistent performance gains. - The authors provide a theoretical analysis (Theorems 1 & 2) to back their method.
- The method is inherently more complex to implement than vanilla LoRA. The cost of the pre-processing step is not discussed. - The paper does not compare ID-LoRA against LoRI in either single-task or multi-task settings, even though several of the result tables are structurally similar to those in LoRI. Since LoRI is more parameter-efficient than ID-LoRA, including this comparison would provide a clearer picture of the trade-offs. - The method introduces a new and important hyperparameter, $k$
- Section 4.3 provides a theoretical foundation for the proposed method, which is interesting. - With fewer trainable parameters, ID-LoRA achieves better results in both single-task and multi-task scenarios.
- The reasoning behind using MID for initializing matrices $A_i$ is unclear. Please compare/justify against alternatives like SVD or random initialization. - The writing is messy and unclear. - Numerous abbreviations (e.g., PMRC, RB, Pivot) are used without definition, hindering comprehension. And the Equation numbering is inconsistent; some have equation numbers, some do not. - The terms "good pivot" and "bad pivot" are undefined, making Assumption 3 resemble a definition rather than
1. **Novel Parameter Efficiency** The core idea of reusing clustered pretrained parameters as frozen bases is innovative. It reduces trainable parameters by up to 46% compared to standard LoRA while maintaining competitive performance. 2. **Solid Theoretical Grounding** The paper provides rigorous theoretical guarantees (Theorems 1-2) showing tighter error bounds for cluster-based decomposition compared to global low-rank adaptation. 3. **Comprehensive Multi-Task Evaluation** Ex
1. **Ambiguity in Mathematical Formulation** The description of the Rank Boosting (RB) mechanism is unclear, and the associated formula exhibits dimensional inconsistencies. Evidence from Section 4.2, Equation (4) shows the term `fc([x1_i, x2_i]B)`, where `x1_i` and `x2_i` are partitions of a vector of size `r`, resulting in a concatenated vector of size `r`. However, matrix `B` is defined as having dimensions `R^(r/2) x (d/2)`, making standard matrix-vector multiplication impossible. This ambi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
