Dual LoRA: Enhancing LoRA with Magnitude and Direction Updates
Yixing Xu, Chao Li, Xuanwu Yin, Spandan Tiwari, Dong Li, Ashish Sirasao, Emad Barsoum

TL;DR
Dual LoRA enhances the original LoRA method by separately modeling magnitude and direction of parameter updates, leading to improved performance across various NLP tasks and models.
Contribution
This paper introduces Dual LoRA, a novel approach that incorporates magnitude and direction biases into LoRA for better fine-tuning of large language models.
Findings
Consistently outperforms LoRA and variants on NLP tasks
Effective across multiple models including RoBERTa, DeBERTa, LLaMA-1/2/3
Improves performance with the same number of trainable parameters
Abstract
Low-rank adaptation (LoRA) is one of the most popular methods among parameter-efficient fine-tuning (PEFT) methods to adapt pre-trained large language models (LLMs) to specific downstream tasks. However, the model trained based on LoRA often has an unsatisfactory performance due to its low-rank assumption. In this paper, we propose a novel method called Dual LoRA to improve the performance by incorporating an inductive bias into the original LoRA. Specifically, we separate low-rank matrices into two groups: the magnitude group to control whether or not and how far we should update a parameter and the direction group to decide whether this parameter should move forward or backward, to better simulate the parameter updating process of the full fine-tuning based on gradient-based optimization algorithms. We show that this can be simply achieved by adding a ReLU function to the magnitude…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The idea of splitting LoRA updates into magnitude and direction components is simple. It introduces an inductive bias aligned with how FFT updates parameters. 2. Dual LoRA only adds two extra low-rank matrices and simple nonlinearities (ReLU, Sign). 3. The paper includes extensive experiments on diverse tasks and model scales (7B–70B), showing consistent and sometimes notable gains. And the ablation study on STE variants (XNOR-Net, DoReFa-Net) and rank ratios (r₁/r₂) is detailed and insightfu
1. Although the paper claims Dual LoRA mimics FFT’s parameter update process, it does not quantify this resemblance. Adding empirical analysis comparing Dual LoRA’s update direction to FFT will enhance this paper. 2. Both Dual LoRA and DoRA employ magnitude–direction decomposition. While Section 2.2 discusses differences, the distinction remains conceptually blurry. It is necessary to include a structural comparison figure or ablation (e.g., removing ReLU/Sign) to explicitly show what Dual LoRA
The formulation appears to be novel and empirically useful. The presentation is clear and easy to follow.
The motivation behind this formulation is quite unclear. The paper mentioned that the actual update can be represented as direction * magnitude, but we can represent $\Delta W$ in arbitrary way and I can't see why the proposed way is preferred. On the other hand, the whole Sign(DC) part only produces 1-bit information, which appears to be quite restrictive. My own feeling is that the proposed method appears to be a special case of a gated version of LoRA similar to GLU, with constraints of non-n
- The proposed Dual LoRA uses two groups of parameters to incorporate an inductive bias, which seems interesting. - They evaluate Dual LoRA from 7B up to 70B model sizes and consistently outperform LoRA and DoRA. - They introduce warmup tricks and straight-through estimator (STE) tricks.
- This idea seems closely mirror DoRA (ICML 2024), which also decomposes weight updates into magnitude and direction. Although there are slight differences in how to model and combine the magnitude and directions (four low-rank matrices and element-wise multiplication in this paper, versus two matrix multiplications in the DoRA paper), the distinction does not fundamentally alter the underlying paradigm. - The evaluation is limited. - They only compare with DoRA/LoRA on generation tasks and
1. The paper is well-written and easy to read.
1. To address the disadvantage of LoRA and its variants, the authors separated the low-rank matrices into two groups: the magnitude and direction groups. While in the experiments, the authors did not validate the effectiveness of the magnitude group. 2. To address the problem brought by fewer training parameters of LoRA, the authors introduce two groups of low-rank matrices. The proposed method obviously increases the computational cost compared to LoRA, while the authors did not provide the co
1.The paper is well organized and well written. 2.The authors present a well-motivated approach with a simple, easy-to-follow framework. 3.It conducts numerous experiments, validates the experimental results on models of various series and sizes, and covers a wide range of evaluation tasks
1. **In Section 5, the analysis of rank is not rigorous.** The operations of ReLU and Sign can both potentially reduce the rank, so the inequality in Equation (16) only holds in certain cases. The authors should either limit its validity scope or clarify, perhaps with reference to Figure 3, that the inequality holds in most cases but not universally. Furthermore, using Equation (16) to claim an upper bound on the rank is not meaningful, since Equation (15) similarly satisfies $rank(BA)\le ran
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Topic Modeling
