Sparsity-Aware Low-Rank Representation for Efficient Fine-Tuning of Large Language Models

Longteng Zhang; Sen Wu; Shuai Hou; Zhengyu Qing; Zhuo Zheng; Danning Ke; Qihong Lin; Qiang Wang; Shaohuai Shi; Xiaowen Chu

arXiv:2601.16991·cs.LG·January 29, 2026

Sparsity-Aware Low-Rank Representation for Efficient Fine-Tuning of Large Language Models

Longteng Zhang, Sen Wu, Shuai Hou, Zhengyu Qing, Zhuo Zheng, Danning Ke, Qihong Lin, Qiang Wang, Shaohuai Shi, Xiaowen Chu

PDF

Open Access

TL;DR

SALR introduces a sparsity-aware low-rank fine-tuning method for large language models, combining pruning and low-rank adaptation to reduce model size and improve inference speed while maintaining performance.

Contribution

The paper proposes SALR, a novel framework that unifies sparse pruning with low-rank adaptation, providing theoretical guarantees and practical efficiency improvements for fine-tuning large language models.

Findings

01

Achieves 50% sparsity while matching LoRA performance on benchmarks.

02

Reduces model size by 2 times and speeds up inference by 1.7 times.

03

Provides theoretical analysis on pruning error bounds and residual information recovery.

Abstract

Adapting large pre-trained language models to downstream tasks often entails fine-tuning millions of parameters or deploying costly dense weight updates, which hinders their use in resource-constrained environments. Low-rank Adaptation (LoRA) reduces trainable parameters by factorizing weight updates, yet the underlying dense weights still impose high storage and computation costs. Magnitude-based pruning can yield sparse models but typically degrades LoRA's performance when applied naively. In this paper, we introduce SALR (Sparsity-Aware Low-Rank Representation), a novel fine-tuning paradigm that unifies low-rank adaptation with sparse pruning under a rigorous mean-squared-error framework. We prove that statically pruning only the frozen base weights minimizes the pruning error bound, and we recover the discarded residual information via a truncated-SVD low-rank adapter, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning · Topic Modeling