Harmonizing and Merging Source Models for CLIP-based Domain Generalization
Yuhe Ding, Jian Liang, Bo Jiang, Zi Wang, Aihua Zheng, Bin Luo

TL;DR
This paper introduces HAM, a novel framework that merges source models to enhance CLIP-based domain generalization, effectively addressing conflicts during multi-source training and achieving state-of-the-art results.
Contribution
The paper proposes a new source model merging framework called HAM that enriches source samples, harmonizes model updates, and integrates knowledge to improve domain generalization in CLIP-based models.
Findings
HAM achieves state-of-the-art performance on five benchmark datasets.
Model merging effectively mitigates training conflicts and enhances generalization.
The approach outperforms existing methods in CLIP-based domain generalization.
Abstract
CLIP-based domain generalization aims to improve model generalization to unseen domains by leveraging the powerful zero-shot classification capabilities of CLIP and multiple source datasets. Existing methods typically train a single model across multiple source domains to capture domain-shared information. However, this paradigm inherently suffers from two types of conflicts: 1) sample conflicts, arising from noisy samples and extreme domain shifts among sources; and 2) optimization conflicts, stemming from competition and trade-offs during multi-source training. Both hinder the generalization and lead to suboptimal solutions. Recent studies have shown that model merging can effectively mitigate the competition of multi-objective optimization and improve generalization performance. Inspired by these findings, we propose Harmonizing and Merging (HAM), a novel source model merging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Advanced Graph Neural Networks
MethodsContrastive Language-Image Pre-training
