TL;DR
ThanoRA introduces a task heterogeneity-aware multi-task low-rank adaptation framework that improves multi-task learning efficiency and performance without extra inference costs by tailoring subspaces and preserving diversity.
Contribution
It proposes a novel method for multi-task adaptation that allocates task-specific subspaces based on heterogeneity and enforces diversity, outperforming existing approaches without additional inference overhead.
Findings
Outperforms strong baselines across multimodal and text benchmarks.
Surpasses separate task-specific fine-tuning in performance.
No additional inference overhead or structural modifications required.
Abstract
Low-Rank Adaptation (LoRA) is widely adopted for downstream fine-tuning of foundation models due to its efficiency and zero additional inference cost. Many real-world applications require foundation models to specialize in several specific tasks simultaneously, motivating the need for efficient multi-task downstream adaptation. To address this need, existing studies have primarily explored two directions: Model Merging with LoRA, which shows advantages in training-free scenarios but still lags behind multi-task training in overall performance; and MoE-based LoRA approaches, which improve multi-task learning performance but introduce routers that hinder the mergeability of LoRA parameters and incur considerable inference overhead, thereby limiting real-world deployment practicality. To this end, we propose ThanoRA, a Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation framework that…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The paper proposes an elegant and practically relevant way to handle task heterogeneity in multi-task LoRA adaptation, combining entropy-based rank allocation and decomposed orthogonality regularization. The synergy between these two modules is well-motivated and experimentally validated. Unlike MoE-based methods, ThanoRA maintains full mergeability and zero inference overhead. 2. The methodology is mathematically sound and well-justified. The blockwise composition proposition and orthogona
1. The introduced information-theoretic calibration term is central to the method, yet its intuition and derivation could be elaborated further. Specifically, the paper states that the term encourages alignment by minimizing mutual information between task identity and normalized representations, but does not empirically analyze how much task identity leakage remains after optimization. A deeper interpretability or visualization study would strengthen the claim. 2. While ThanoRA presents a well
The paper introduces a spectral-entropy-based task complexity modeling method that adaptively allocates LoRA ranks for each task and layer, enabling automatic adjustment of subspace capacity in multi-task settings. Through rank allocation visualization (Fig. 4) and layer-wise rank distribution across tasks (Fig. 5), the paper demonstrates the model’s ability to automatically align with semantic hierarchies, thereby enhancing the interpretability of the proposed approach.
The design of the SPR regularization term remains largely empirical. Although Proposition 3.2 provides sufficient conditions, the paper does not clarify which form of orthogonality (on A or B) is more effective in practice. Moreover, enforcing orthogonality introduces substantial additional training time. The ablation study is not sufficiently comprehensive, lacking analyses on the shared cooperative subspace as well as comparisons with alternative rank allocation strategies (e.g., uniform
- The motivation of this paper is clear and good: Model merging is efficient during inference, but may lead to performance degradation, while MoE can maintain performance, but additional modules introduce extra storage and computation cost. - The experiments are extensive. Multiple ablation studies are conducted to analyze the properties of the proposal.
- The writing of this paper seems hasty and needs to be improved, and some notations are abused and unclear. Please see the questions. - The proposed framework relies on some assumptions that I am unsure whether are valid in practice. Moreover, some strategies look too intuitive and unreasonable to me. Please see the questions. - The scope of this paper is not clear to me and needs further explanation. Please see the questions.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
