Random Initialization of Gated Sparse Adapters

Vi Retault; Yoha\"i-Eliel Berreby

arXiv:2511.01794·cs.LG·November 4, 2025

Random Initialization of Gated Sparse Adapters

Vi Retault, Yoha\"i-Eliel Berreby

PDF

Open Access

TL;DR

This paper introduces RIGSA, a novel sparse adapter method for fine-tuning language models that starts from random initialization, gates, and sparsifies adapters, showing reduced forgetting on certain tasks compared to existing methods.

Contribution

RIGSA is a new approach combining random initialization, gating, and iterative pruning for sparse adapters, improving task retention during fine-tuning.

Findings

01

RIGSA reduces forgetting more than QLoRA on GSM8k.

02

RIGSA performs comparably to random masking.

03

RIGSA can learn new tasks from chance performance.

Abstract

When fine-tuning language models on new tasks, catastrophic forgetting -- performance degradation on previously-learned tasks -- is a ubiquitous problem. While Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA address this through low-rank adapters, sparse adaptation offers an alternative that doesn't impose rank constraints. We introduce Random Initialization of Gated Sparse Adapters (RIGSA), which starts from randomly-initialized full-rank adapters, gates them with a ReZero analog, and sparsifies them with iterative magnitude pruning. We evaluate RIGSA on SmolLM2-1.7B-Instruct using a novel vision-in-text task (Textual MNIST) and measure forgetting on PIQA, HellaSwag, and GSM8k. SmolLM2-1.7B-Instruct initially performs around chance level on Textual MNIST, and is capable of learning the task through RIGSA, 4-bit QLoRA and random masking. In spite of having more trainable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Speech Recognition and Synthesis