Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Yuxin Zhang, Lirui Zhao, Mingbao Lin, Yunyun Sun, Yiwu Yao, Xingjia, Han, Jared Tanner, Shiwei Liu, Rongrong Ji

TL;DR
This paper introduces DSnoT, a training-free method for fine-tuning sparse large language models by iterative pruning and growing, significantly improving performance without backpropagation.
Contribution
The paper proposes a novel training-free fine-tuning approach for sparse LLMs that reduces computational costs and enhances performance through dynamic pruning and growing techniques.
Findings
Outperforms state-of-the-art Wanda by 26.79 perplexity at 70% sparsity.
Effective across various models like LLaMA, Vicuna, and OPT.
Operates efficiently in linear time without backpropagation.
Abstract
The ever-increasing large language models (LLMs), though opening a potential path for the upcoming artificial general intelligence, sadly drops a daunting obstacle on the way towards their on-device deployment. As one of the most well-established pre-LLMs approaches in reducing model complexity, network pruning appears to lag behind in the era of LLMs, due mostly to its costly fine-tuning (or re-training) necessity under the massive volumes of model parameter and training data. To close this industry-academia gap, we introduce Dynamic Sparse No Training (DSnoT), a training-free fine-tuning approach that slightly updates sparse LLMs without the expensive backpropagation and any weight updates. Inspired by the Dynamic Sparse Training, DSnoT minimizes the reconstruction error between the dense and sparse LLMs, in the fashion of performing iterative weight pruning-and-growing on top of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning
MethodsOPT · Pruning
