Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation

Zhan Zhuang; Xiequn Wang; Wei Li; Yulong Zhang; Qiushi Huang; Shuhao Chen; Xuehao Wang; Yanbin Wei; Yuhe Nie; Kede Ma; Yu Zhang; Ying Wei

arXiv:2506.05713·cs.LG·July 29, 2025

Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation

Zhan Zhuang, Xiequn Wang, Wei Li, Yulong Zhang, Qiushi Huang, Shuhao Chen, Xuehao Wang, Yanbin Wei, Yuhe Nie, Kede Ma, Yu Zhang, Ying Wei

PDF

TL;DR

This paper introduces CoTo, a progressive training strategy for low-rank adaptation that gradually increases adapter activation, leading to better model generalization, robustness, and efficiency across various tasks and LoRA variants.

Contribution

CoTo is a novel progressive training method that enhances low-rank adaptation by promoting balanced optimization and broader exploration of the loss landscape.

Findings

01

Boosts single-task performance

02

Improves multi-task merging accuracy

03

Enhances pruning robustness

Abstract

Low-rank adaptation (LoRA) has emerged as a leading parameter-efficient fine-tuning technique for adapting large foundation models, yet it often locks adapters into suboptimal minima near their initialization. This hampers model generalization and limits downstream operators such as adapter merging and pruning. Here, we propose CoTo, a progressive training strategy that gradually increases adapters' activation probability over the course of fine-tuning. By stochastically deactivating adapters, CoTo encourages more balanced optimization and broader exploration of the loss landscape. We provide a theoretical analysis showing that CoTo promotes layer-wise dropout stability and linear mode connectivity, and we adopt a cooperative-game approach to quantify each adapter's marginal contribution. Extensive experiments demonstrate that CoTo consistently boosts single-task performance, enhances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.