Efficiently Seeking Flat Minima for Better Generalization in Fine-Tuning Large Language Models and Beyond
Jiaxin Deng, Qingcheng Zhu, Junbiao Pang, Linlin Yang, Zhongqian Fu, Baochang Zhang

TL;DR
This paper introduces FMLoRA and EFMLoRA, methods to find flat minima in low-rank adaptation for large models, improving generalization and efficiency in fine-tuning across NLP and vision-language tasks.
Contribution
The paper proposes a novel approach to seek flat minima in LoRA, with theoretical transferability of perturbations and an efficient variant, enhancing model generalization and fine-tuning performance.
Findings
EFMLoRA matches LoRA's efficiency while outperforming it in accuracy.
EFMLoRA surpasses full fine-tuning on large language models.
Sharpness correlates with LoRA's generalization, verified empirically.
Abstract
Little research explores the correlation between the expressive ability and generalization ability of the low-rank adaptation (LoRA). Sharpness-Aware Minimization (SAM) improves model generalization for both Convolutional Neural Networks (CNNs) and Transformers by encouraging convergence to locally flat minima. However, the connection between sharpness and generalization has not been fully explored for LoRA due to the lack of tools to either empirically seek flat minima or develop theoretical methods. In this work, we propose Flat Minima LoRA (FMLoRA) and its efficient version, i.e., EFMLoRA, to seek flat minima for LoRA. Concretely, we theoretically demonstrate that perturbations in the full parameter space can be transferred to the low-rank subspace. This approach eliminates the potential interference introduced by perturbations across multiple matrices in the low-rank subspace. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
