BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation

Peijia Qin; Ruiyi Zhang; Pengtao Xie

arXiv:2410.09758·cs.LG·August 5, 2025

BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation

Peijia Qin, Ruiyi Zhang, Pengtao Xie

PDF

Open Access 3 Reviews

TL;DR

BiDoRA introduces a bi-level optimization framework for weight-decomposed low-rank adaptation, improving parameter efficiency and performance in fine-tuning large language models by decoupling magnitude and direction updates.

Contribution

The paper proposes a novel bi-level optimization approach for PEFT that decouples weight decomposition components, reducing overfitting and enhancing adaptation performance.

Findings

01

BiDoRA achieves a magnitude-direction update correlation of -8.042, closer to full fine-tuning.

02

Outperforms DoRA and other PEFT methods across diverse NLP tasks.

03

Statistically significant improvements on the GLUE benchmark with p-value 2.4×10^{-4}.

Abstract

Parameter-efficient fine-tuning (PEFT) is a flexible and efficient method for adapting large language models (LLMs) to downstream tasks. Among these methods, weight-decomposed low-rank adaptation (DoRA) is a promising approach that decomposes weight matrices into magnitude and direction components to mimic full fine-tuning (FT) better. However, DoRA's simultaneous optimization of these components makes it over-expressive, increases the risk of overfitting, and creates a coupled updating pattern that limits its learning capacity. To address these issues, we propose Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation (BiDoRA), a novel PEFT method based on a bi-level optimization framework. BiDoRA fundamentally differs from DoRA by optimizing the magnitude and direction in two separate, asynchronous loops using distinct training and validation data splits. This decoupled…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 3

Strengths

- In experiments, the proposed method demonstrates consistently superior results compared to the others. - The paper was written reasonably clearly, so I was able to follow the equations.

Weaknesses

- The motivation for the bi-level optimization approach was a bit vague. I did read the explanation in the introduction, but it says "Furthermore, in DoRA, the magnitude and incremental direction components are optimized concurrently, leading to a highly constrained updating pattern that may overlook the diverse learning patterns required for different downstream tasks." It is not clear what this means concretely... - Experiments were on models that do not represent the state of the art at this

Reviewer 02Rating 3Confidence 5

Strengths

* The paper is well-written and easy to understand. * The authors' efforts to evaluate across multiple tasks are commendable. However, there are still many problems with these evaluations (see Weaknesses).

Weaknesses

1. **The motivation of this paper appears to be questionable.** The authors claim that DoRA increases the risk of overfitting, basing this on two pieces of evidence: - DoRA introduces additional parameters compared to LoRA. - The gap between training and test accuracy curves for DoRA is larger than that of BiDoRA. However, these two points do not convincingly support the claim. First, while additional parameters can sometimes contribute to overfitting, they are not a sufficient condit

Reviewer 03Rating 6Confidence 3

Strengths

Originality: BiDoRA introduces a unique bi-level optimization approach to fine-tuning large language models (LLMs), addressing a common tradeoff in parameter-efficient fine-tuning (PEFT) between generalization and computational efficiency. By decomposing the model’s weights into magnitude and direction components and optimizing each on different data splits, BiDoRA creatively combines aspects of neural architecture search with PEFT, offering a compelling alternative to current methods like LoRA

Weaknesses

Computational Cost and Efficiency: Although BiDoRA shows a significant performance improvement, the bi-level optimization approach introduces a high computational cost, as reported with nearly fourfold overhead compared to LoRA. This could limit BiDoRA’s practicality in scenarios where resources are constrained. An in-depth analysis of ways to reduce computational complexity without sacrificing performance—such as approximations, alternative regularization techniques, or a comparative exploratio

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Enhancement Techniques · Advanced Vision and Imaging · Image and Signal Denoising Methods

MethodsSoftmax · Attention Is All You Need