TL;DR
FlexMerge is a versatile, data-free framework for model merging that balances accuracy and size, revealing new insights into algorithm performance across different model sizes.
Contribution
The paper introduces FlexMerge, enabling flexible, multi-size model merging and systematic analysis of accuracy-size trade-offs across various algorithms.
Findings
Modestly larger merged models significantly improve accuracy (up to 13.5%).
Algorithm performance rankings vary with model size.
FlexMerge works effectively on vision and NLP benchmarks with up to 30 tasks.
Abstract
Model merging has emerged as an efficient method to combine multiple single-task fine-tuned models. The merged model can enjoy multi-task capabilities without expensive training. While promising, merging into a single model often suffers from an accuracy gap with respect to the fine-tuned models. On the other hand, deploying all individual fine-tuned models incurs high storage costs. We propose FlexMerge, a novel data-free model merging framework that: (a) flexibly generates merged models of varying sizes, spanning the full spectrum from a single merged model to retaining all fine-tuned models; and (b) supports multiple merging algorithms in a unified framework. Using FlexMerge, we systematically characterize the accuracy-size trade-off of different algorithms. Our study reveals two key findings: first, even modestly larger merged models can yield steep accuracy gains (up to 13.5% when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
