SyMerge: From Non-Interference to Synergistic Merging via Single-Layer Adaptation

Aecheon Jung; Seunghwan Lee; Dongyoon Han; Sungeun Hong

arXiv:2412.19098·cs.LG·October 7, 2025

SyMerge: From Non-Interference to Synergistic Merging via Single-Layer Adaptation

Aecheon Jung, Seunghwan Lee, Dongyoon Han, Sungeun Hong

PDF

Open Access 1 Repo 3 Reviews

TL;DR

SyMerge introduces a lightweight, self-labeling framework that optimizes a single task-specific layer to enhance synergy in model merging, achieving state-of-the-art results across multiple domains.

Contribution

It proposes a novel single-layer adaptation method with self-labeling for synergistic model merging, surpassing interference-focused approaches.

Findings

01

Achieves state-of-the-art performance on vision, dense prediction, and NLP benchmarks.

02

Adapted layers transfer effectively to other merging methods.

03

Simple single-layer optimization significantly improves merge quality.

Abstract

Model merging offers an efficient alternative to multi-task learning by combining independently fine-tuned models, but most prior approaches focus mainly on avoiding task interference. We argue instead that the real potential of merging lies in achieving synergy, where tasks enhance one another. Our intuition comes from a pilot study showing that when a classifier trained on one task is paired with the encoder of another, the resulting cross-task performance strongly predicts merge quality. Moreover, adapting even a single task-specific layer can substantially improve this compatibility, suggesting a simple yet powerful lever for synergy. Building on this insight, we introduce SyMerge, a lightweight framework that jointly optimizes one task-specific layer and merging coefficients. To ensure stability without labels, SyMerge employs a robust self-labeling strategy guided by expert model…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 8Confidence 4

Strengths

- The test of concept experiments add empirical value on the choices made on the framework - There is a section that covers the choice of the objective function, which not only justify the selected loss but shows there was a careful experimental design process - The experiment cover different models, showing the method works across a range of common model choices.

Weaknesses

I only have minor comments, some of the figures and tables that occupy half page on pages 8 and 9 could be arranged so they do not cut the text so much like in the current version.

Reviewer 02Rating 2Confidence 3

Strengths

1. This work uncovers an interesting phenomenon that stronger cross-task performance leads to better merging performance, and provides theoretical support for this observation. 2. The paper is well-organized and easy to follow. 3. The experiments demonstrate that the proposed method achieves promising results.

Weaknesses

1. **Robustness to the size and quality of the unlabeled test set.** The proposed method relies on using an unlabeled test set to perform distillation between the merged model and all task experts. However, it is unclear how well this approach would work in more practical scenarios such as few-shot, long-tail, noisy, or OOD test sets. These settings naturally arise in real-world applications where users may input any query data. 2. **Potentially misleading distillation.** Each task expert is di

Reviewer 03Rating 6Confidence 4

Strengths

1. The work advances a shift in objectives for model merging—arguing for positive synergy rather than mere non-interference. This reconceptualization is original in the landscape of model merging. 2. Theoretical justification is provided showing that improved cross-task performance tightens loss bounds for merged models, supporting the focus on functional alignment. 3. SyMerge outperforms a strong suite of prior model merging baselines in multi-task classification, dense prediction, and NLP, wi

Weaknesses

1. While the pursuit of task synergy is motivated well, the core adaptation step (jointly tuning a single layer and coefficients with self-labeling) is a fairly incremental extension over test-time adaptive methods such as AdaMerging. The framework design—minimizing cross-entropy or L1 to match expert predictions—can be considered a straightforward application of self-labeling in existing frameworks (Representation Surgery and WUDI-Merging). 2. The proposed method’s reliance on the predictions f

Code & Models

Repositories

aim-skku/modeltinting
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBusiness Process Modeling and Analysis · Semantic Web and Ontologies · Software System Performance and Reliability