A Unified Generalization Framework for Model Merging: Trade-offs, Non-Linearity, and Scaling Laws
Qinglun Li, Anke Tang, Miao Zhang, Mengzhu Wang, Quanjun Yin, Li Shen

TL;DR
This paper develops a unified theoretical framework for model merging, explaining its effectiveness and trade-offs, especially under heterogeneous finetuning conditions, and provides scaling laws to guide practical model merging strategies.
Contribution
It introduces a comprehensive theory integrating $L_2$-Stability to explain model merging, covering both linear and non-linear algorithms, and derives scaling laws for hyperparameter optimization.
Findings
Theoretical explanation of the optimization-generalization trade-off.
Unified framework for linear and non-linear merging algorithms.
Empirical validation of scaling laws across multiple architectures and tasks.
Abstract
Model merging efficiently aggregates capabilities from multiple fine-tuned models into a single one, operating purely in parameter space without original data or expensive re-computation. Despite empirical successes, a unified theory for its effectiveness under heterogeneous finetuning hyperparameters (e.g., varying learning rates, batch sizes) remains missing. Existing federated learning theories focus purely on optimization, which fails to explain model merging and inherently leads to theoretical paradoxes. To address this challenge, we pioneer the integration of -Stability theory into heterogeneous environments to rigorously decouple the excess risk of the merged model into optimization and generalization errors. This comprehensive analysis yields three main contributions: (i) We mathematically establish the fundamental \textit{Optimization-Generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
