Text-Region Matching for Multi-Label Image Recognition with Missing   Labels

Leilei Ma; Hongxing Xie; Lei Wang; Yanping Fu; Dengdi Sun; Haifeng; Zhao

arXiv:2407.18520·cs.CV·August 30, 2024

Text-Region Matching for Multi-Label Image Recognition with Missing Labels

Leilei Ma, Hongxing Xie, Lei Wang, Yanping Fu, Dengdi Sun, Haifeng, Zhao

PDF

1 Repo

TL;DR

This paper introduces TRM-ML, a novel multi-label image recognition method that improves text-vision matching by focusing on category-aware regions, multimodal contrastive learning, and label estimation to handle missing labels effectively.

Contribution

The paper proposes a new approach that enhances cross-modal matching and label estimation in multi-label recognition with missing labels, outperforming existing methods.

Findings

01

Outperforms state-of-the-art on multiple benchmarks.

02

Effectively handles missing labels through category prototypes.

03

Improves text-vision semantic alignment with region-based matching.

Abstract

Recently, large-scale visual language pre-trained (VLP) models have demonstrated impressive performance across various downstream tasks. Motivated by these advancements, pioneering efforts have emerged in multi-label image recognition with missing labels, leveraging VLP prompt-tuning technology. However, they usually cannot match text and vision features well, due to complicated semantics gaps and missing labels in a multi-label image. To tackle this challenge, we propose $T$ ext- $R$ egion $M$ atching for optimizing $M$ ulti- $L$ abel prompt tuning, namely TRM-ML, a novel method for enhancing meaningful cross-modal matching. Compared to existing methods, we advocate exploring the information of category-aware regions rather than the entire image or pixels, which contributes to bridging the semantic gap between textual and visual representations in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yu-gi-oh-leilei/trm-ml
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Learning