DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations
Ximeng Sun, Ping Hu, Kate Saenko

TL;DR
DualCoOp leverages pretrained vision-language models with a novel context optimization framework to enable rapid adaptation for multi-label recognition tasks with limited annotations, outperforming existing methods.
Contribution
The paper introduces DualCoOp, a lightweight, unified framework that enhances multi-label recognition in low-label regimes by utilizing class prompts and strong pretrained alignments.
Findings
Outperforms state-of-the-art methods on standard benchmarks.
Effective in low-label and zero-shot multi-label recognition scenarios.
Quick adaptation with minimal additional training overhead.
Abstract
Solving multi-label recognition (MLR) for images in the low-label regime is a challenging task with many real-world applications. Recent work learns an alignment between textual and visual spaces to compensate for insufficient image labels, but loses accuracy because of the limited amount of available MLR annotations. In this work, we utilize the strong alignment of textual and visual features pretrained with millions of auxiliary image-text pairs and propose Dual Context Optimization (DualCoOp) as a unified framework for partial-label MLR and zero-shot MLR. DualCoOp encodes positive and negative contexts with class names as part of the linguistic input (i.e. prompts). Since DualCoOp only introduces a very light learnable overhead upon the pretrained vision-language framework, it can quickly adapt to multi-label recognition tasks that have limited annotations and even unseen classes.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsText and Document Classification Technologies · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
