Cross-Domain Few-Shot Learning via Multi-View Collaborative Optimization with Vision-Language Models
Dexia Chen, Wentao Zhang, Qianjie Zhu, Ping Hu, Weibing Li, Tong Zhang, Ruixuan Wang

TL;DR
This paper introduces CoMuCo, a novel fine-tuning strategy for vision-language models that enhances cross-domain few-shot learning by employing multi-view features and consistency constraints, supported by a new benchmark.
Contribution
It proposes a new fine-tuning method, CoMuCo, with multi-view feature extraction and consistency constraints, specifically designed for cross-domain few-shot learning with vision-language models.
Findings
CoMuCo outperforms existing methods on cross-domain few-shot benchmarks.
The new benchmark effectively evaluates cross-domain few-shot learning performance.
Empirical results demonstrate robustness and improved accuracy of CoMuCo.
Abstract
Vision-language models (VLMs) pre-trained on natural image and language data, such as CLIP, have exhibited significant potential in few-shot image recognition tasks, leading to development of various efficient transfer learning methods. These methods exploit inherent pre-learned knowledge in VLMs and have achieved strong performance on standard image datasets. However, their effectiveness is often limited when confronted with cross-domain tasks where imaging domains differ from natural images. To address this limitation, we propose Consistency-guided Multi-view Collaborative Optimization (CoMuCo), a novel fine-tuning strategy for VLMs. This strategy employs two functionally complementary expert modules to extract multi-view features, while incorporating prior knowledge-based consistency constraints and information geometry-based consensus mechanisms to enhance the robustness of feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
