NAN: A Training-Free Solution to Coefficient Estimation in Model Merging

Chongjie Si; Kangtao Lv; Jingjing Jiang; Yadao Wang; Yongwei Wang; Xiaokang Yang; Wenbo Su; Bo Zheng; Wei Shen

arXiv:2505.16148·cs.LG·May 23, 2025

NAN: A Training-Free Solution to Coefficient Estimation in Model Merging

Chongjie Si, Kangtao Lv, Jingjing Jiang, Yadao Wang, Yongwei Wang, Xiaokang Yang, Wenbo Su, Bo Zheng, Wei Shen

PDF

Open Access

TL;DR

NAN introduces a training-free, parameter norm-based method for estimating optimal merging coefficients in model merging, improving performance without additional training or data access.

Contribution

The paper proposes NAN, a novel, simple, and effective coefficient estimation method based on least-squares optimization and parameter norms, enhancing model merging.

Findings

01

NAN consistently improves baseline merging methods.

02

NAN is training-free and widely applicable.

03

The method scales merging weights with task-specific information.

Abstract

Model merging offers a training-free alternative to multi-task learning by combining independently fine-tuned models into a unified one without access to raw data. However, existing approaches often rely on heuristics to determine the merging coefficients, limiting their scalability and generality. In this work, we revisit model merging through the lens of least-squares optimization and show that the optimal merging weights should scale with the amount of task-specific information encoded in each model. Based on this insight, we propose NAN, a simple yet effective method that estimates model merging coefficients via the inverse of parameter norm. NAN is training-free, plug-and-play, and applicable to a wide range of merging strategies. Extensive experiments on show that NAN consistently improves performance of baseline methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques