SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity

Samir Khaki; Xiuyu Li; Junxian Guo; Ligeng Zhu; Chenfeng Xu; Konstantinos N. Plataniotis; Amir Yazdanbakhsh; Kurt Keutzer; Song Han; Zhijian Liu

arXiv:2506.16500·cs.LG·June 23, 2025

SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity

Samir Khaki, Xiuyu Li, Junxian Guo, Ligeng Zhu, Chenfeng Xu, Konstantinos N. Plataniotis, Amir Yazdanbakhsh, Kurt Keutzer, Song Han, Zhijian Liu

PDF

Open Access 1 Video

TL;DR

SparseLoRA introduces a novel method for accelerating large language model fine-tuning by dynamically selecting sparse weight subsets, significantly reducing computational costs while maintaining accuracy across diverse tasks.

Contribution

The paper presents SparseLoRA, a lightweight, training-free sparsity estimator that enhances parameter-efficient fine-tuning by dynamically selecting weights, addressing limitations of existing methods.

Findings

01

Reduces computational cost by up to 2.2x

02

Achieves up to 1.6x speedup in fine-tuning

03

Maintains accuracy across multiple downstream tasks

Abstract

Fine-tuning LLMs is both computationally and memory-intensive. While parameter-efficient fine-tuning methods, such as QLoRA and DoRA, reduce the number of trainable parameters and lower memory usage, they do not decrease computational cost. In some cases, they may even slow down fine-tuning. In this paper, we introduce SparseLoRA, a method that accelerates LLM fine-tuning through contextual sparsity. We propose a lightweight, training-free SVD sparsity estimator that dynamically selects a sparse subset of weights for loss and gradient computation. Also, we systematically analyze and address sensitivity across layers, tokens, and training steps. Our experimental results show that SparseLoRA reduces computational cost by up to 2.2 times and a measured speedup of up to 1.6 times while maintaining accuracy across various downstream tasks, including commonsense and arithmetic reasoning, code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity· slideslive

Taxonomy

TopicsTopic Modeling · Numerical Methods and Algorithms · Model Reduction and Neural Networks