Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts

Samin Yeasar Arnob; Zhan Su; Minseon Kim; Oleksiy Ostapenko; Riyasat Ohib; Esra'a Saleh; Doina Precup; Lucas Caccia; Alessandro Sordoni

arXiv:2507.07140·cs.LG·July 15, 2025

Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts

Samin Yeasar Arnob, Zhan Su, Minseon Kim, Oleksiy Ostapenko, Riyasat Ohib, Esra'a Saleh, Doina Precup, Lucas Caccia, Alessandro Sordoni

PDF

Open Access

TL;DR

This paper introduces a simple, effective method for training sparse adapters that outperform LoRA and full fine-tuning in merging parameter-efficient experts across multiple NLP tasks, enhancing modularity and scalability.

Contribution

The paper presents a novel, straightforward approach for training sparse adapters and demonstrates their superior merging capabilities over existing methods in NLP tasks.

Findings

01

Sparse adapters outperform LoRA and full fine-tuning after merging.

02

Merging sparse adapters maintains high in-distribution performance.

03

Strong out-of-distribution performance remains challenging.

Abstract

Merging parameter-efficient task experts has recently gained growing attention as a way to build modular architectures that can be rapidly adapted on the fly for specific downstream tasks, without requiring additional fine-tuning. Typically, LoRA serves as the foundational building block of such parameter-efficient modular architectures, leveraging low-rank weight structures to reduce the number of trainable parameters. In this paper, we study the properties of sparse adapters, which train only a subset of weights in the base neural network, as potential building blocks of modular architectures. First, we propose a simple method for training highly effective sparse adapters, which is conceptually simpler than existing methods in the literature and surprisingly outperforms both LoRA and full fine-tuning in our setting. Next, we investigate the merging properties of these sparse adapters…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques

MethodsBalanced Selection