Scaling Sparse Fine-Tuning to Large Language Models

Alan Ansell; Ivan Vuli\'c; Hannah Sterz; Anna Korhonen and; Edoardo M. Ponti

arXiv:2401.16405·cs.CL·February 5, 2024·1 cites

Scaling Sparse Fine-Tuning to Large Language Models

Alan Ansell, Ivan Vuli\'c, Hannah Sterz, Anna Korhonen and, Edoardo M. Ponti

PDF

Open Access 2 Repos

TL;DR

This paper introduces SpIEL, a novel sparse fine-tuning method that efficiently scales to large language models by maintaining and updating parameter indices and deltas, outperforming existing methods like LoRA.

Contribution

The paper presents SpIEL, a new sparse fine-tuning technique that scales to large LLMs, using a dynamic index and delta management approach with efficient regrowth criteria.

Findings

01

SpIEL often outperforms LoRA in performance.

02

SpIEL is compatible with quantization and efficient optimizers.

03

Scales effectively to models like LLaMA 2 7B and 13B.

Abstract

Large Language Models (LLMs) are difficult to fully fine-tune (e.g., with instructions or human feedback) due to their sheer number of parameters. A family of parameter-efficient sparse fine-tuning methods have proven promising in terms of performance but their memory requirements increase proportionally to the size of the LLMs. In this work, we scale sparse fine-tuning to state-of-the-art LLMs like LLaMA 2 7B and 13B. We propose SpIEL, a novel sparse fine-tuning method which, for a desired density level, maintains an array of parameter indices and the deltas of these parameters relative to their pretrained values. It iterates over: (a) updating the active deltas, (b) pruning indices (based on the change of magnitude of their deltas) and (c) regrowth of indices. For regrowth, we explore two criteria based on either the accumulated gradients of a few candidate parameters or their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsPruning · Shrink and Fine-Tune · SM3