Sparse Matrix in Large Language Model Fine-tuning

Haoze He; Juncheng Billy Li; Xuan Jiang; Heather Miller

arXiv:2405.15525·cs.CL·May 20, 2025·2 cites

Sparse Matrix in Large Language Model Fine-tuning

Haoze He, Juncheng Billy Li, Xuan Jiang, Heather Miller

PDF

Open Access 1 Repo

TL;DR

This paper introduces Sparse Matrix Tuning (SMT), a method that selects critical sub-matrices for fine-tuning large language models, reducing computational costs and memory usage while outperforming existing PEFT methods like LoRA.

Contribution

The paper presents SMT, a novel sparse matrix selection approach that minimizes the accuracy gap with full fine-tuning and improves efficiency in large language model fine-tuning.

Findings

01

SMT outperforms LoRA and DoRA on various tasks.

02

SMT reduces GPU memory footprint by 67% compared to full fine-tuning.

03

SMT maintains performance without the plateau issues of other PEFT methods.

Abstract

LoRA and its variants have become popular parameter-efficient fine-tuning (PEFT) methods due to their ability to avoid excessive computational costs. However, an accuracy gap often exists between PEFT methods and full fine-tuning (FT), and this gap has yet to be systematically studied. In this work, we introduce a method for selecting sparse sub-matrices that aim to minimize the performance gap between PEFT vs. full fine-tuning (FT) while also reducing both fine-tuning computational cost and memory cost. Our Sparse Matrix Tuning (SMT) method begins by identifying the most significant sub-matrices in the gradient update, updating only these blocks during the fine-tuning process. In our experiments, we demonstrate that SMT consistently surpasses other PEFT baseline (e.g. LoRA and DoRA) in fine-tuning popular large language models such as LLaMA across a broad spectrum of tasks, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HectorHHZ/Sparse_Matrix_Tuning
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Neural Networks and Applications

MethodsLLaMA