Bi-LoRA: Efficient Sharpness-Aware Minimization for Fine-Tuning Large-Scale Models

Yuhang Liu; Tao Li; Zhehao Huang; Zuopeng Yang; and Xiaolin Huang

arXiv:2508.19564·cs.LG·April 21, 2026

Bi-LoRA: Efficient Sharpness-Aware Minimization for Fine-Tuning Large-Scale Models

Yuhang Liu, Tao Li, Zhehao Huang, Zuopeng Yang, and Xiaolin Huang

PDF

1 Video

TL;DR

Bi-LoRA introduces a memory-efficient method combining dual LoRA modules to improve model generalization by effectively optimizing sharpness in large-scale fine-tuning.

Contribution

It proposes a novel dual-module LoRA approach that decouples sharpness optimization from task adaptation, reducing memory overhead and enhancing flat minima attainment.

Findings

01

Bi-LoRA outperforms standard LoRA in generalization across tasks.

02

It reduces training costs by enabling simultaneous optimization and perturbation.

03

Experiments show improved flatness and robustness in large-scale model fine-tuning.

Abstract

Fine-tuning large-scale pre-trained models with limited data presents significant challenges for generalization. While Sharpness-Aware Minimization (SAM) has proven effective in improving generalization by seeking flat minima, its substantial extra memory and computation overhead make it impractical for large models. Integrating SAM with parameter-efficient fine-tuning methods like Low-Rank Adaptation (LoRA) is a promising direction. However, we find that directly applying SAM to LoRA parameters limits the sharpness optimization to a restricted subspace, hindering its effectiveness. To address this limitation, we propose Bi-directional Low-Rank Adaptation (Bi-LoRA), which introduces an auxiliary LoRA module to model SAM's adversarial weight perturbations. It decouples SAM's weight perturbations from LoRA optimization: the primary LoRA module adapts to specific tasks via standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Bi-LoRA: Efficient Sharpness-Aware Minimization for Fine-Tuning Large-Scale Models· slideslive