STAR: Spectral Truncation and Rescale for Model Merging

Yu-Ang Lee; Ching-Yun Ko; Tejaswini Pedapati; I-Hsin Chung; Mi-Yen; Yeh; Pin-Yu Chen

arXiv:2502.10339·cs.CL·February 17, 2025

STAR: Spectral Truncation and Rescale for Model Merging

Yu-Ang Lee, Ching-Yun Ko, Tejaswini Pedapati, I-Hsin Chung, Mi-Yen, Yeh, Pin-Yu Chen

PDF

Open Access 1 Repo 1 Video

TL;DR

STAR introduces a spectral truncation and rescaling method for model merging that reduces conflicts and maintains performance without extra data, showing significant improvements in NLP tasks.

Contribution

The paper presents STAR, a novel spectral truncation and rescaling technique that enhances multi-model merging robustness and performance without additional training data.

Findings

01

STAR outperforms baselines by 4.2% on Flan-T5 with 12 models.

02

STAR is robust across different model sizes and NLP tasks.

03

The method requires no extra inference on training data.

Abstract

Model merging is an efficient way of obtaining a multi-task model from several pretrained models without further fine-tuning, and it has gained attention in various domains, including natural language processing (NLP). Despite the efficiency, a key challenge in model merging is the seemingly inevitable decrease in task performance as the number of models increases. In this paper, we propose $S$ pectral $T$ runcation $A$ nd $R$ escale (STAR) that aims at mitigating ``merging conflicts'' by truncating small components in the respective spectral spaces, which is followed by an automatic parameter rescaling scheme to retain the nuclear norm of the original matrix. STAR requires no additional inference on original training data and is robust to hyperparamater choice. We demonstrate the effectiveness of STAR through extensive model merging cases on diverse NLP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ibm/star
pytorchOfficial

Videos

STAR: Spectral Truncation and Rescale for Model Merging· underline

Taxonomy

TopicsModel-Driven Software Engineering Techniques · Scientific Computing and Data Management · Advanced Computational Techniques and Applications

MethodsSoftmax · Attention Is All You Need · Flan-T5