STAR: Spectral Truncation and Rescale for Model Merging
Yu-Ang Lee, Ching-Yun Ko, Tejaswini Pedapati, I-Hsin Chung, Mi-Yen, Yeh, Pin-Yu Chen

TL;DR
STAR introduces a spectral truncation and rescaling method for model merging that reduces conflicts and maintains performance without extra data, showing significant improvements in NLP tasks.
Contribution
The paper presents STAR, a novel spectral truncation and rescaling technique that enhances multi-model merging robustness and performance without additional training data.
Findings
STAR outperforms baselines by 4.2% on Flan-T5 with 12 models.
STAR is robust across different model sizes and NLP tasks.
The method requires no extra inference on training data.
Abstract
Model merging is an efficient way of obtaining a multi-task model from several pretrained models without further fine-tuning, and it has gained attention in various domains, including natural language processing (NLP). Despite the efficiency, a key challenge in model merging is the seemingly inevitable decrease in task performance as the number of models increases. In this paper, we propose pectral runcation nd escale (STAR) that aims at mitigating ``merging conflicts'' by truncating small components in the respective spectral spaces, which is followed by an automatic parameter rescaling scheme to retain the nuclear norm of the original matrix. STAR requires no additional inference on original training data and is robust to hyperparamater choice. We demonstrate the effectiveness of STAR through extensive model merging cases on diverse NLP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Scientific Computing and Data Management · Advanced Computational Techniques and Applications
MethodsSoftmax · Attention Is All You Need · Flan-T5
