
TL;DR
DiDi-Merging is a compact dynamic model merging framework that balances shared and expert parameters using differentiable rank allocation, achieving high accuracy with minimal additional parameters across various tasks.
Contribution
It introduces a slim, data-free, differentiable rank optimization approach for dynamic model merging, significantly reducing parameter overhead while maintaining high performance.
Findings
Matches prior dynamic baselines at 1.24x parameters of a single model
Surpasses prior methods at 1.4x parameters
Applicable across vision, language, and multimodal tasks
Abstract
Model merging enables the reuse of fine-tuned models without joint training or access to original data. Dynamic merging further improves flexibility by selectively activating task-relevant parameters and efficiently composing experts across multiple tasks. However, existing dynamic methods either maintain a full shared model with tiny experts or allocate excessive capacity to experts, leading to suboptimal accuracy--efficiency trade-offs. To address this, we propose DiDi-Merging, a slim dynamic merging framework that leverages differentiable rank allocation to balance shared and expert parameters. By formulating parameter budgeting as differentiable rank optimization in low-rank modules and introducing a data-free refinement step to recover task fidelity, DiDi-Merging matches prior dynamic baselines at only 1.24x the parameters of a single fine-tuned model and surpasses them at 1.4x,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
