Resolving Interference (RI): Disentangling Models for Improved Model Merging
Pratik Ramesh, George Stoica, Arun Iyer, Leshem Choshen, Judy Hoffman

TL;DR
This paper introduces Resolving Interference (RI), a lightweight method that reduces cross-task interference in model merging by disentangling models using unlabeled data, leading to improved performance and robustness.
Contribution
The paper proposes RI, a novel disentanglement framework that minimizes interference during model merging without requiring task-specific data, enhancing performance and generalization.
Findings
RI improves merging performance by up to 3.8%.
RI enhances generalization to unseen domains by up to 2.3%.
RI is robust to auxiliary data sources and less sensitive to hyperparameters.
Abstract
Model merging has shown that multitask models can be created by directly combining the parameters of different models that are each specialized on tasks of interest. However, models trained independently on distinct tasks often exhibit interference that degrades the merged model's performance. To solve this problem, we formally define the notion of Cross-Task Interference as the drift in the representation of the merged model relative to its constituent models. Reducing cross-task interference is key to improving merging performance. To address this issue, we propose our method, Resolving Interference (RI), a light-weight adaptation framework which disentangles expert models to be functionally orthogonal to the space of other tasks, thereby reducing cross-task interference. RI does this whilst using only unlabeled auxiliary data as input (i.e., no task-data is needed), allowing it to be…
Peer Reviews
Decision·Submitted to ICLR 2026
**Clear innovation**: The RI Loss provides a novel regularization mechanism to reduce task interference, offering a fresh perspective for model merging research. **Simple and practical**: The method can be seamlessly integrated into existing training setups as a plug-and-play module. **Empirical validation**: Extensive evaluations across multiple benchmarks show promising gains. Solid motivation: The task-relatedness perspective provides a sound theoretical rationale for reducing interference
**1,Dependency on auxiliary data**: The effectiveness appears tied to the availability and distribution of auxiliary data, yet its limitations are not thoroughly examined. **2,Missing key baselines**: Recent strong merging methods (e.g., Wudi-Merging) are not included in comparisons. **3,Limited large-model experiments**: The study focuses on moderate-scale models, leaving the applicability to large-scale vision-language or language models unverified. **4,Incomplete presentation**: The main f
1) The paper proposes a distillation method to disentangle the output of a task-specific model through the definition of cross-task interference. 2) The proposed method is examined in vision tasks following the model merging recent setup with 8, 14, and 20 tasks, and it showed that it can improve the existing model merging method. 3) The experiment and ablation are well thought out and analyzed to investigate the cross-task interference and the proposed RI.
1) In the abstract, the 10% mentioned is confusing. The improvement made by this paper against SOTA is roughly within 2%. This is misleading. 2) Missing analysis of the data-less model merging SOTA "WUDI" merging. By how much can RI improve WUDI? 3) The proposed method is not evaluated on NLP tasks, which are commonly studied in almost all recent model merging methods. Is RI applicable to NLP tasks? 4) The computational requirements should be analyzed, and jointly optimizing the task vectors sim
- This paper tackles merging interference by matching the expert output with the merged model output and proposes a new Resolving Interference (RI) method. - The paper conducts up to 20 datasets experiments and presents improved results on CLIP across various merging methods. - It includes a detailed analysis of the use of an auxiliary dataset, which requires no access to labels.
- The weaknesses of RI loss: - It only applies to classification tasks and ignores generative tasks and LLMs (which are dominant): all experiments use CLIP-style ViT encoders and vision classification heads. - The algorithm explicitly requires the set of heads $\{h_i\}_{i=1}^N$. This assumption does not hold for many generative settings (e.g., segmentation, detection, diffusion, LLMs). - It cannot handle different task types, which are naturally supported by other merging techniques
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
