Cross-Domain Few-Shot Learning via Multi-View Collaborative Optimization with Vision-Language Models

Dexia Chen; Wentao Zhang; Qianjie Zhu; Ping Hu; Weibing Li; Tong Zhang; Ruixuan Wang

arXiv:2508.12861·cs.CV·August 19, 2025

Cross-Domain Few-Shot Learning via Multi-View Collaborative Optimization with Vision-Language Models

Dexia Chen, Wentao Zhang, Qianjie Zhu, Ping Hu, Weibing Li, Tong Zhang, Ruixuan Wang

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces CoMuCo, a novel fine-tuning strategy for vision-language models that enhances cross-domain few-shot learning by employing multi-view features and consistency constraints, supported by a new benchmark.

Contribution

It proposes a new fine-tuning method, CoMuCo, with multi-view feature extraction and consistency constraints, specifically designed for cross-domain few-shot learning with vision-language models.

Findings

01

CoMuCo outperforms existing methods on cross-domain few-shot benchmarks.

02

The new benchmark effectively evaluates cross-domain few-shot learning performance.

03

Empirical results demonstrate robustness and improved accuracy of CoMuCo.

Abstract

Vision-language models (VLMs) pre-trained on natural image and language data, such as CLIP, have exhibited significant potential in few-shot image recognition tasks, leading to development of various efficient transfer learning methods. These methods exploit inherent pre-learned knowledge in VLMs and have achieved strong performance on standard image datasets. However, their effectiveness is often limited when confronted with cross-domain tasks where imaging domains differ from natural images. To address this limitation, we propose Consistency-guided Multi-view Collaborative Optimization (CoMuCo), a novel fine-tuning strategy for VLMs. This strategy employs two functionally complementary expert modules to extract multi-view features, while incorporating prior knowledge-based consistency constraints and information geometry-based consensus mechanisms to enhance the robustness of feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Kxon/CoMuCo_cross_domain_benchmark
dataset· 3 dl
3 dl

Videos

Cross-Domain Few-Shot Learning via Multi-View Collaborative Optimization with Vision-Language Models· underline

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning