Merge to Learn: Efficiently Adding Skills to Language Models with Model   Merging

Jacob Morrison; Noah A. Smith; Hannaneh Hajishirzi; Pang Wei Koh,; Jesse Dodge; Pradeep Dasigi

arXiv:2410.12937·cs.CL·October 18, 2024

Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging

Jacob Morrison, Noah A. Smith, Hannaneh Hajishirzi, Pang Wei Koh,, Jesse Dodge, Pradeep Dasigi

PDF

Open Access 1 Video

TL;DR

This paper introduces a model merging technique that efficiently adds new skills to language models by training separately and merging, reducing costs and improving safety feature integration without retraining from scratch.

Contribution

It proposes a parallel training and merging approach for skill addition to language models, offering a cost-effective alternative to retraining and fine-tuning.

Findings

01

Merging after parallel training is comparably effective to retraining.

02

The method enhances safety features and prompt compliance.

03

It significantly reduces training costs.

Abstract

Adapting general-purpose language models to new skills is currently an expensive process that must be repeated as new instruction datasets targeting new skills are created, or can cause the models to forget older skills. In this work, we investigate the effectiveness of adding new skills to preexisting models by training on the new skills in isolation and later merging with the general model (e.g. using task vectors). In experiments focusing on scientific literature understanding, safety, and coding, we find that the parallel-train-then-merge procedure, which is significantly cheaper than retraining the models on updated data mixtures, is often comparably effective. Our experiments also show that parallel training is especially well-suited for enabling safety features in LMs relative to continued finetuning and retraining, as it dramatically improves model compliance with safe prompts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies