Learning Semantic-Aware Representation in Visual-Language Models for Multi-Label Recognition with Partial Labels
Haoxian Ruan, Zhihua Xu, Zhijing Yang, Yongyi Lu, Jinghui Qin,, Tianshui Chen

TL;DR
This paper introduces a semantic decoupling and prompt optimization framework for CLIP-based multi-label recognition with partial labels, effectively reducing semantic confusion and improving accuracy on benchmark datasets.
Contribution
It proposes a novel semantic decoupling module and category-specific prompt optimization to enhance CLIP's performance in multi-label recognition with partial labels.
Findings
Significantly outperforms state-of-the-art methods on COCO and Pascal VOC datasets.
Effectively separates category information, reducing semantic confusion.
Achieves better performance with a simpler model structure.
Abstract
Multi-label recognition with partial labels (MLR-PL), in which only some labels are known while others are unknown for each image, is a practical task in computer vision, since collecting large-scale and complete multi-label datasets is difficult in real application scenarios. Recently, vision language models (e.g. CLIP) have demonstrated impressive transferability to downstream tasks in data limited or label limited settings. However, current CLIP-based methods suffer from semantic confusion in MLR task due to the lack of fine-grained information in the single global visual and textual representation for all categories. In this work, we address this problem by introducing a semantic decoupling module and a category-specific prompt optimization method in CLIP-based framework. Specifically, the semantic decoupling module following the visual encoder learns category-specific feature maps…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies
MethodsSoftmax · Attention Is All You Need
