Scaling Sparse Fine-Tuning to Large Language Models
Alan Ansell, Ivan Vuli\'c, Hannah Sterz, Anna Korhonen and, Edoardo M. Ponti

TL;DR
This paper introduces SpIEL, a novel sparse fine-tuning method that efficiently scales to large language models by maintaining and updating parameter indices and deltas, outperforming existing methods like LoRA.
Contribution
The paper presents SpIEL, a new sparse fine-tuning technique that scales to large LLMs, using a dynamic index and delta management approach with efficient regrowth criteria.
Findings
SpIEL often outperforms LoRA in performance.
SpIEL is compatible with quantization and efficient optimizers.
Scales effectively to models like LLaMA 2 7B and 13B.
Abstract
Large Language Models (LLMs) are difficult to fully fine-tune (e.g., with instructions or human feedback) due to their sheer number of parameters. A family of parameter-efficient sparse fine-tuning methods have proven promising in terms of performance but their memory requirements increase proportionally to the size of the LLMs. In this work, we scale sparse fine-tuning to state-of-the-art LLMs like LLaMA 2 7B and 13B. We propose SpIEL, a novel sparse fine-tuning method which, for a desired density level, maintains an array of parameter indices and the deltas of these parameters relative to their pretrained values. It iterates over: (a) updating the active deltas, (b) pruning indices (based on the change of magnitude of their deltas) and (c) regrowth of indices. For regrowth, we explore two criteria based on either the accumulated gradients of a few candidate parameters or their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsPruning · Shrink and Fine-Tune · SM3
