Neutral Residues: Revisiting Adapters for Model Extension
Franck Signe Talla, Edouard Grave, Herv\'e J\'egou

TL;DR
This paper introduces neutral residues, a novel adapter modification technique that enhances language model adaptation to new domains, especially new languages, while preserving original performance better than existing methods.
Contribution
The paper proposes neutral residues, a new adapter design that improves domain adaptation by minimizing forgetting, considering data, architecture, and training jointly.
Findings
Neutral residues outperform finetuning, LoRA, and vanilla adapters.
Significant improvement in language adaptation with minimal forgetting.
Effective in extending models to new languages.
Abstract
We address the problem of extending a pretrained large language model to a new domain that was not seen during training. Standard techniques, such as finetuning or low-rank adaptation (LoRA) are successful at domain adaptation, but do not formally add capacity to the model. This often leads to a trade-off, between performing well on the new domain vs. degrading performance on the original domain. Here, we revisit and improve adapters to extend LLMs from three angles: data, architecture and training procedure, which are advantageously considered jointly. The resulting method, called neutral residues, modifies adapters in a way that leads each new residual block to output near-zeros on the original domain. This solution leads to strong results when adapting a state-of-the-art model originally trained on English to a new language. Neutral residues significantly outperform competing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Model-Driven Software Engineering Techniques · Software Testing and Debugging Techniques
MethodsResidual Connection · Convolution · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Block
