Sparsity-Aware Evolution for Model Merging
Huan Zhang, Yanjian Zhang, Guillaume Wisniewski, Nadi Tomeh, Bang Liu

TL;DR
This paper introduces a sparsity-aware evolutionary framework for model merging that enhances reliability and sparsity control in large-scale language models through iterative pruning and merging guided by a combined score function.
Contribution
It presents a novel sparsity-aware evolutionary method that integrates pruning into the mutation process and incorporates sparsity constraints into the scoring, improving model merging.
Findings
Improves model merging reliability across large-scale benchmarks.
Effectively promotes sparsity while maintaining performance.
Easy to incorporate into existing workflows due to simplicity.
Abstract
We propose a sparsity-aware evolutionary (SAE) framework for model merging that involves iterative pruning-merging cycles to act as a novel mutation operator. We incorporate the sparsity constraints into the score function, which steers the evolutionary process to favor more sparse models, in addition to other conventional performance scores. Interestingly, the by-product of \textit{competition} for sparsity introduces an extra local \textit{attraction} and interplay into the evolutionary process: if one competitor has more zero elements, the other competitor's non-zero elements will occupy those positions, even though the less sparse competitor loses to the more sparse competitor in other positions. The proposed pipeline is evaluated on a variety of large-scale LLM benchmarks. Experiments demonstrate that our approach can improve model merging reliability across multiple benchmarks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning
