Dual-Granularity Cross-Modal Identity Association for Weakly-Supervised Text-to-Person Image Matching

Yafei Zhang; Yongle Shang; Huafeng Li

arXiv:2507.06744·cs.CV·July 10, 2025

Dual-Granularity Cross-Modal Identity Association for Weakly-Supervised Text-to-Person Image Matching

Yafei Zhang, Yongle Shang, Huafeng Li

PDF

Open Access

TL;DR

This paper introduces a dual-granularity identity association mechanism for weakly-supervised text-to-person image matching, improving the model's ability to handle complex identity relationships with enhanced accuracy and robustness.

Contribution

It proposes a novel local-and-global dual-granularity association mechanism and a confidence-based dynamic adjustment network for better cross-modal identity matching.

Findings

01

Significant boost in matching accuracy

02

Effective handling of weakly associated samples

03

Enhanced robustness through novel sample construction

Abstract

Weakly supervised text-to-person image matching, as a crucial approach to reducing models' reliance on large-scale manually labeled samples, holds significant research value. However, existing methods struggle to predict complex one-to-many identity relationships, severely limiting performance improvements. To address this challenge, we propose a local-and-global dual-granularity identity association mechanism. Specifically, at the local level, we explicitly establish cross-modal identity relationships within a batch, reinforcing identity constraints across different modalities and enabling the model to better capture subtle differences and correlations. At the global level, we construct a dynamic cross-modal identity association network with the visual modality as the anchor and introduce a confidence-based dynamic adjustment mechanism, effectively enhancing the model's ability to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis