SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using   Singular Values

Chengwei Sun; Jiwei Wei; Yujia Wu; Yiming Shi; Shiyuan He; Zeyu Ma,; Ning Xie; Yang Yang

arXiv:2409.05926·cs.LG·September 12, 2024·2 cites

SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values

Chengwei Sun, Jiwei Wei, Yujia Wu, Yiming Shi, Shiyuan He, Zeyu Ma,, Ning Xie, Yang Yang

PDF

Open Access

TL;DR

SVFit introduces a parameter-efficient fine-tuning method that uses singular value decomposition to initialize low-rank matrices, significantly reducing parameters and improving adaptation in large pre-trained models.

Contribution

SVFit proposes a novel SVD-based initialization for low-rank matrices in PEFT, enhancing efficiency and performance over existing methods like LoRA.

Findings

01

Outperforms LoRA in various tasks

02

Requires 16 times fewer trainable parameters

03

Achieves rapid domain adaptation

Abstract

Large pre-trained models (LPMs) have demonstrated exceptional performance in diverse natural language processing and computer vision tasks. However, fully fine-tuning these models poses substantial memory challenges, particularly in resource-constrained environments. Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, mitigate this issue by adjusting only a small subset of parameters. Nevertheless, these methods typically employ random initialization for low-rank matrices, which can lead to inefficiencies in gradient descent and diminished generalizability due to suboptimal starting points. To address these limitations, we propose SVFit, a novel PEFT approach that leverages singular value decomposition (SVD) to initialize low-rank matrices using critical singular values as trainable parameters. Specifically, SVFit performs SVD on the pre-trained weight matrix to obtain the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks