Category-Adaptive Cross-Modal Semantic Refinement and Transfer for   Open-Vocabulary Multi-Label Recognition

Haijing Liu; Tao Pu; Hefeng Wu; Keze Wang; Liang Lin

arXiv:2412.06190·cs.CV·December 10, 2024

Category-Adaptive Cross-Modal Semantic Refinement and Transfer for Open-Vocabulary Multi-Label Recognition

Haijing Liu, Tao Pu, Hefeng Wu, Keze Wang, Liang Lin

PDF

Open Access

TL;DR

This paper introduces a novel framework, C2SRT, that enhances open-vocabulary multi-label recognition by adaptively refining and transferring semantic information across categories using cross-modal and language model capabilities.

Contribution

The paper proposes a category-adaptive framework with intra- and inter-category modules, improving semantic correlation modeling for open-vocabulary multi-label recognition.

Findings

01

Outperforms state-of-the-art algorithms on OV-MLR benchmarks.

02

Effectively captures semantic correlations within and across categories.

03

Enhances recognition accuracy in open-vocabulary settings.

Abstract

Benefiting from the generalization capability of CLIP, recent vision language pre-training (VLP) models have demonstrated an impressive ability to capture virtually any visual concept in daily images. However, due to the presence of unseen categories in open-vocabulary settings, existing algorithms struggle to effectively capture strong semantic correlations between categories, resulting in sub-optimal performance on the open-vocabulary multi-label recognition (OV-MLR). Furthermore, the substantial variation in the number of discriminative areas across diverse object categories is misaligned with the fixed-number patch matching used in current methods, introducing noisy visual cues that hinder the accurate capture of target semantics. To tackle these challenges, we propose a novel category-adaptive cross-modal semantic refinement and transfer (C $^{2}$ SRT) framework to explore the semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies

MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training