BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation
Peijia Qin, Ruiyi Zhang, Pengtao Xie

TL;DR
BiDoRA introduces a bi-level optimization framework for weight-decomposed low-rank adaptation, improving parameter efficiency and performance in fine-tuning large language models by decoupling magnitude and direction updates.
Contribution
The paper proposes a novel bi-level optimization approach for PEFT that decouples weight decomposition components, reducing overfitting and enhancing adaptation performance.
Findings
BiDoRA achieves a magnitude-direction update correlation of -8.042, closer to full fine-tuning.
Outperforms DoRA and other PEFT methods across diverse NLP tasks.
Statistically significant improvements on the GLUE benchmark with p-value 2.4×10^{-4}.
Abstract
Parameter-efficient fine-tuning (PEFT) is a flexible and efficient method for adapting large language models (LLMs) to downstream tasks. Among these methods, weight-decomposed low-rank adaptation (DoRA) is a promising approach that decomposes weight matrices into magnitude and direction components to mimic full fine-tuning (FT) better. However, DoRA's simultaneous optimization of these components makes it over-expressive, increases the risk of overfitting, and creates a coupled updating pattern that limits its learning capacity. To address these issues, we propose Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation (BiDoRA), a novel PEFT method based on a bi-level optimization framework. BiDoRA fundamentally differs from DoRA by optimizing the magnitude and direction in two separate, asynchronous loops using distinct training and validation data splits. This decoupled…
Peer Reviews
Decision·Submitted to ICLR 2025
- In experiments, the proposed method demonstrates consistently superior results compared to the others. - The paper was written reasonably clearly, so I was able to follow the equations.
- The motivation for the bi-level optimization approach was a bit vague. I did read the explanation in the introduction, but it says "Furthermore, in DoRA, the magnitude and incremental direction components are optimized concurrently, leading to a highly constrained updating pattern that may overlook the diverse learning patterns required for different downstream tasks." It is not clear what this means concretely... - Experiments were on models that do not represent the state of the art at this
* The paper is well-written and easy to understand. * The authors' efforts to evaluate across multiple tasks are commendable. However, there are still many problems with these evaluations (see Weaknesses).
1. **The motivation of this paper appears to be questionable.** The authors claim that DoRA increases the risk of overfitting, basing this on two pieces of evidence: - DoRA introduces additional parameters compared to LoRA. - The gap between training and test accuracy curves for DoRA is larger than that of BiDoRA. However, these two points do not convincingly support the claim. First, while additional parameters can sometimes contribute to overfitting, they are not a sufficient condit
Originality: BiDoRA introduces a unique bi-level optimization approach to fine-tuning large language models (LLMs), addressing a common tradeoff in parameter-efficient fine-tuning (PEFT) between generalization and computational efficiency. By decomposing the model’s weights into magnitude and direction components and optimizing each on different data splits, BiDoRA creatively combines aspects of neural architecture search with PEFT, offering a compelling alternative to current methods like LoRA
Computational Cost and Efficiency: Although BiDoRA shows a significant performance improvement, the bi-level optimization approach introduces a high computational cost, as reported with nearly fourfold overhead compared to LoRA. This could limit BiDoRA’s practicality in scenarios where resources are constrained. An in-depth analysis of ways to reduce computational complexity without sacrificing performance—such as approximations, alternative regularization techniques, or a comparative exploratio
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Advanced Vision and Imaging · Image and Signal Denoising Methods
MethodsSoftmax · Attention Is All You Need
