Scalable Multi-Task Low-Rank Model Adaptation
Zichen Tian, Antoine Ledent, Qianru Sun

TL;DR
This paper introduces mtLoRA, a scalable multi-task low-rank adaptation method that addresses performance degradation at scale by selectively regularizing and adapting components, achieving superior accuracy with fewer parameters.
Contribution
The paper proposes mtLoRA, a novel scalable multi-task adaptation framework with spectral-aware regularization, block-level adaptation, and fine-grained routing, improving performance and efficiency at large scale.
Findings
mtLoRA outperforms state-of-the-art on multiple benchmarks.
Achieves 2.3% higher accuracy on average.
Uses 47% fewer parameters and 24% less training time.
Abstract
Scaling multi-task low-rank adaptation (LoRA) to a large number of tasks induces catastrophic performance degradation, such as an accuracy drop from 88.2% to 2.0% on DOTA when scaling from 5 to 15 tasks. This failure is due to parameter and representation misalignment. We find that existing solutions, like regularization and dynamic routing, fail at scale because they are constrained by a fundamental trade-off: strengthening regularization to reduce inter-task conflict inadvertently suppresses the essential feature discrimination required for effective routing. In this work, we identify two root causes for this trade-off. First, uniform regularization disrupts inter-task knowledge sharing: shared underlying knowledge concentrates in high-SV components (89% alignment on Flanv2->BBH). Uniform regularization forces high-SV components to update in orthogonal directions, directly disrupting…
Peer Reviews
Decision·ICLR 2026 Poster
1. The primary strength of the paper is its clear and empirically-supported diagnosis of why multi-task LoRA fails. The analyses in Table 1, showing the failure of uniform regularization (1A) , the spectral heterogeneity of LoRA (1B) , and the weakness of component-level attachment (1C), are persuasive and form a strong foundation for the method. 2. The three proposed components of mtLoRA map directly and logically to the identified problems. Spectral-aware regularization is a clever fix for th
# Major Concerns 1. **Ambiguous Baseline Comparison:** The main results in Table 2 compare mtLoRA to "HydraLoRA" and "Baseline". The check-marked rows indicate that mtLoRA is built on top of HydraLoRA. This table is effectively an ablation study, not a SOTA comparison. It is unclear how mtLoRA stacks up against other, different SOTA multi-task methods mentioned in the related work, such as MoLE , TIES-Merging , or DARE. The claim of "state-of-the-art results" is not fully substantiated against
1. The paper identifies and formalizes spectral heterogeneity in multi-task LoRA modules—an overlooked cause of task conflict. The introduction of spectral-aware regularization and fine-grained routing is conceptually fresh and well-motivated by empirical analysis (e.g., SVD studies in Table 1). 2. The methodology is well-grounded and clearly explained, combining theoretical motivation (singular value analysis, gradient interference) with solid empirical validation. The proposed masking and wei
1. While the empirical spectral analysis is convincing, the paper lacks a formal theoretical justification of why spectral-aware regularization specifically balances discrimination and conflict. A more rigorous connection between singular value magnitude and information content could strengthen the conceptual contribution. 2. Although Experiments span domains, all are relatively moderate in scale and may not fully test scalability to large LLM adaptation or multi-domain real-world tasks. 3. Th
- The mask formulation is very elegant. It reminded me a lot of Thikonov regularization, which is a good idea. - The experiments agree with related work, orthogonalization, for example, has been shown to work well for fine-tuning in concurrent related work like https://arxiv.org/pdf/2507.13260 . - Experimentally the authors observe competitive or improved results.
- The variables appearing in the equations throughout the paper are often not properly defined. In equation (1) for example, dimensions and the domain, are missing. - The dataset choice is not well explained. Why did the authors choose the Dota dataset instead of VTAB-1k for example? I don't think this choice is explained in a convincing way. - While the paper presents improved numbers, it does not present evidence for the mechanism that the paper presents as an explanation for the efficiency o
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Sparse and Compressive Sensing Techniques
