MSGCoOp: Multiple Semantic-Guided Context Optimization for Few-Shot Learning
Zhaolong Wang, Tongfeng Sun, Mingzheng Du, Yachao Huang

TL;DR
MSGCoOp introduces a semantic-guided, ensemble prompt optimization framework for vision-language models, significantly improving few-shot and cross-domain generalization with efficient computation.
Contribution
The paper proposes a novel ensemble of semantic-guided prompts with diversity regularization, enhancing generalization in vision-language models without heavy computational costs.
Findings
Improves base-to-novel generalization by 1.10% harmonic mean.
Enhances robustness in cross-domain tasks.
Outperforms baseline methods on 11 benchmark datasets.
Abstract
Vision-language pre-trained models (VLMs) such as CLIP have demonstrated remarkable zero-shot generalization, and prompt learning has emerged as an efficient alternative to full fine-tuning. However, existing methods often struggle with generalization to novel classes, a phenomenon attributed to overfitting on seen classes and forgetting general knowledge. Furthermore, recent approaches that improve generalization often introduce complex architectures or heavy computational overhead. In this paper, we propose a Multiple Semantic-Guided Context Optimization (MSGCoOp) framework to enhance few-shot generalization while maintaining computational efficiency. Our approach leverages an ensemble of parallel learnable context vectors to capture diverse semantic aspects. To enrich these prompts, we introduce a semantic guidance mechanism that aligns them with comprehensive class descriptions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
