TL;DR
This paper introduces Mixup Model Merge (M3), a novel method for model merging that uses randomized linear interpolation in parameter space to improve performance, robustness, and flexibility over traditional equal-ratio merging.
Contribution
M3 is a simple, effective model merging technique inspired by Mixup, utilizing Beta-distributed interpolation coefficients to optimize contribution ratios between models.
Findings
M3 outperforms standard equal-ratio merging in task performance.
M3 enhances robustness against out-of-distribution and adversarial attacks.
M3 can be combined with DARE for even better results.
Abstract
Model merging aims to integrate multiple task-specific models into a unified model that inherits the capabilities of the task-specific models, without additional training. Existing model merging methods often lack consideration of the varying contribution ratios of different task-specific models to the final merged model. In this paper, we propose Mixup Model Merge (M3), a simple yet effective method inspired by the randomized linear interpolation strategy from the Mixup data augmentation technique. M3 performs randomized linear interpolation in parameter space between two task-specific LLMs, where interpolation coefficients are sampled from a Beta distribution to explore diverse contribution ratios. This controllable randomness allows M3 to outperform standard equal-ratio merging by discovering better contribution ratio combinations. Extensive experiments show that M3 significantly (1)…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The paper is well-written and clearly presented. 2. The experiments are extensive and provide strong empirical support for the proposed method.
1. Novelty (a) The design of the Beta distribution sampling in M³ appears arbitrary and lacks justification. The chosen values of the shape parameter $\alpha$ vary widely across different base methods (from 0.4 to 2), yet the authors provide no explanation for this phenomenon or its potential impact on model merging performance. (b) the proposed weighted method seems to be a general formulation of task arithmetic, not novel enough. 2. The paper claims that the proposed design enhances robustnes
Mixup Model Merge is a plug-and-play enhancer that grafts onto four existing merge methods with zero re-training and only seven forward passes; it systematically boosts every method on seven standard benchmarks (20 tables), yielding a merged model that beats both specialists on 4/5 datasets, and its validity is underpinned by the flat-basin observation that linear interpolation of same-pre-checkpoint fine-tunes stays in low loss.
1)Statistical significance and uncertainty quantification missing: Performance Drop Rate (PDR) for adversarial sets is based on a single attack seed (Table 2; Appendix G). No direct evidence found in the manuscript that results persist with more λm draws or different seeds. 2)Scalability and practical constraints unaddressed: Experiments restricted to 13-B Llama-2 pairwise merges; no evidence on larger models, >2 parents, or different architectures (Sec. 4). 3)Robustness claims overstated wit
1. The core idea of applying a Mixup-inspired, randomized interpolation strategy directly in the parameter space for model merging seems novel and intuitive. 2. The paper is well-supported by comprehensive experiments. The evaluation covers multiple tasks (instruction following, math, code), multiple merging methods (Average, Task Arithmetic, TIES, DARE), and, importantly, extends to OOD and adversarial robustness, which are crucial for real-world applicability. The results consistently show pe
1. The experiments focus exclusively on merging pairs of models. A key question is how well the proposed method scales to merging more than two models simultaneously. The current formulation for two models is clear, but its extension to multiple models is not discussed and could be more complex. 2. While the effect of \alpha on the distribution shape is well-explained, the paper lacks clear guidance or an intuitive strategy for choosing \alpha for a given pair of models or tasks. The choice see
1. An interesting perspective on the coefficients selection of model merging. 2. The experiments reveals that the proposed method not only can improve the merging performance but also have positive influence on the robustness of the merged model. 3. The proposed method is simple yet effective. The conflict-cancellation observation (interpolated deltas can nullify opposing signs at specific parameters) offers an intuitive mechanism for stability/benefit.
1. It seems that the method can only work on model merging between two models. Can the authors provide further discussions on model merging between multiple models? Currently the contribution’s generality is implied but not evidenced. Model merging between two models is quite an easy problem. 2. The method linearly mixes all parameters. An ablation that interpolates only selected layers/blocks could reveal where most gains arise and reduce risk when models are heterogeneous. 3. For completene
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMixup
