BoRA: Towards More Expressive Low-Rank Adaptation with Block Diversity

Shiwei Li; Xiandi Luo; Haozhao Wang; Xing Tang; Ziqiang Cui; Dugang Liu; Yuhua Li; Xiuqiang He; Ruixuan Li

arXiv:2508.06953·cs.LG·August 12, 2025

BoRA: Towards More Expressive Low-Rank Adaptation with Block Diversity

Shiwei Li, Xiandi Luo, Haozhao Wang, Xing Tang, Ziqiang Cui, Dugang Liu, Yuhua Li, Xiuqiang He, Ruixuan Li

PDF

Open Access 3 Reviews

TL;DR

BoRA enhances low-rank adaptation in large language models by increasing weight rank through block-wise diversification, achieving better performance with minimal additional parameters.

Contribution

Introduces BoRA, a novel method that boosts LoRA weight rank via block-wise diagonal matrices, improving expressiveness with few extra parameters.

Findings

01

BoRA outperforms standard LoRA in multiple datasets.

02

BoRA's block diversification significantly increases model expressiveness.

03

Ablation studies confirm scalability and effectiveness.

Abstract

Low-rank adaptation (LoRA) is a parameter-efficient fine-tuning (PEFT) method widely used in large language models (LLMs). It approximates the update of a pretrained weight matrix $W \in R^{m \times n}$ by the product of two low-rank matrices, $B A$ , where $A \in R^{r \times n}$ and $B \in R^{m \times r} (r ≪ min {m, n})$ . Increasing the dimension $r$ can raise the rank of LoRA weights (i.e., $B A$ ), which typically improves fine-tuning performance but also significantly increases the number of trainable parameters. In this paper, we propose Block Diversified Low-Rank Adaptation (BoRA), which improves the rank of LoRA weights with a small number of additional parameters. Specifically, BoRA treats the product $B A$ as a block matrix multiplication, where $A$ and $B$ are partitioned into $b$ blocks along the columns and rows, respectively (i.e., $A = [A_{1}, \dots, A_{b}]$ and…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

- The methodology is clearly and comprehensively described, and Figure 1 clearly explains BoRA and its differences from other methods. - The experiments in the paper are comprehensive, validating the effectiveness of the method on different tasks and models.

Weaknesses

- The "Results of Different Tuning Granularity" section in Section 4.4 does not guarantee that the number of parameters or computational costs will be consistent (or as consistent as possible). - Although the authors analyzed the efficiency of BoRA, experiments comparing the efficiency of different methods are lacking.

Reviewer 02Rating 6Confidence 4

Strengths

- The paper analyzes the rank limitation of standard LoRA from the perspective of block matrix multiplication, showing that the correlation between block matrices constrains the expressiveness of LoRA and thus provides a strong theoretical foundation for the proposed solution. - It introduces a LoRA variant called BoRA, which eliminates the correlation between block matrices by introducing diagonal matrices, achieving substantial performance improvement with only a small parameter overhead of $

Weaknesses

- Although the authors mention inference latency in the appendix, they lack efficiency comparisons during training (e.g., convergence time, memory usage, training latency), and adding such comparisons would make the work more convincing. - The authors claim that BoRA raises the rank upper bound, and in Related Work they also mention HiRA and KronA as methods that increase the rank of LoRA weights, yet these are not included as baselines for comparison. - In the original MELoRA paper, the number

Reviewer 03Rating 4Confidence 3

Strengths

1. The paper introduces a novel perspective on LoRA by analyzing it through block matrix multiplication, revealing how correlations between block products constrain rank. 2. The paper demonstrates that both standard LoRA and MELoRA are special cases of BoRA, creating a unified theoretical framework. 3. Empirical results are solid, and BoRA is compared with various base models and tasks while achieving strong performance over a range of baselines. 4. The paper provides thorough ablation studies t

Weaknesses

1. In Section 3.2, the authors states > Assuming $A$ and $B$ are divided into $b$ blocks along columns and rows, respectively, BoRA will additionally learn a set of diagonal matrices $\\{\Sigma_{i, j} \in \mathbb{R}^{r \times r} \mid i, j \in[r]\\}$. This notation shows $i, j \in [r]$ for the indices of $\Sigma_{i,j}$, but this is inconsistent with the block count $b$. Since $A$ and $B$ are divided into $b$ blocks, the indices should be $i, j \in [b]$, not $[r]$. 2. Proposition 1 claims that

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Tensor decomposition and applications · Sparse and Compressive Sensing Techniques