Sparsity-Aware Low-Rank Representation for Efficient Fine-Tuning of Large Language Models
Longteng Zhang, Sen Wu, Shuai Hou, Zhengyu Qing, Zhuo Zheng, Danning Ke, Qihong Lin, Qiang Wang, Shaohuai Shi, Xiaowen Chu

TL;DR
SALR introduces a sparsity-aware low-rank fine-tuning method for large language models, combining pruning and low-rank adaptation to reduce model size and improve inference speed while maintaining performance.
Contribution
The paper proposes SALR, a novel framework that unifies sparse pruning with low-rank adaptation, providing theoretical guarantees and practical efficiency improvements for fine-tuning large language models.
Findings
Achieves 50% sparsity while matching LoRA performance on benchmarks.
Reduces model size by 2 times and speeds up inference by 1.7 times.
Provides theoretical analysis on pruning error bounds and residual information recovery.
Abstract
Adapting large pre-trained language models to downstream tasks often entails fine-tuning millions of parameters or deploying costly dense weight updates, which hinders their use in resource-constrained environments. Low-rank Adaptation (LoRA) reduces trainable parameters by factorizing weight updates, yet the underlying dense weights still impose high storage and computation costs. Magnitude-based pruning can yield sparse models but typically degrades LoRA's performance when applied naively. In this paper, we introduce SALR (Sparsity-Aware Low-Rank Representation), a novel fine-tuning paradigm that unifies low-rank adaptation with sparse pruning under a rigorous mean-squared-error framework. We prove that statically pruning only the frozen base weights minimizes the pruning error bound, and we recover the discarded residual information via a truncated-SVD low-rank adapter, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning · Topic Modeling
