Bayesian Model Merging
Kaiyang Li, Shaobo Han, Qing Su, Shihao Ji

TL;DR
Bayesian Model Merging (BMM) is a novel framework that efficiently combines multiple task-specific models into one, leveraging Bayesian regression and optimization, and performs well across vision and language benchmarks.
Contribution
Introduces BMM, a bi-level Bayesian optimization framework for model merging that incorporates inductive bias and hyperparameter tuning, including a data-free variant.
Findings
BMM outperforms existing plug-and-play baselines in vision and language tasks.
On the ViT-L/14 benchmark, BMM achieves 95.1 performance, close to eight separate experts.
BMM effectively merges up to 20 vision tasks and 5 language tasks.
Abstract
Model merging aims to combine multiple task-specific expert models into a single model without joint retraining, offering a practical alternative to multi-task learning when data access or computational budget is limited. Existing methods, however, face two key limitations: (1) they overlook the valuable inductive bias of strong anchor models and estimate the merged weights from scratch, and (2) they rely on a shared hyperparameter setting across different modules of the network, lacking a global optimization strategy. This paper introduces Bayesian Model Merging (BMM), a plug-and-play bi-level optimization framework, where the inner level formulates the model merging as an activation-based Bayesian regression under a strong prior induced by an anchor model, yielding an efficient closed-form solution; and the outer level leverages a Bayesian optimization procedure to search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
