YYDS: Visible-Infrared Person Re-Identification with Coarse Descriptions
Yunhao Du, Zhicheng Zhao, Fei Su

TL;DR
This paper introduces YYDS, a novel model for visible-infrared person re-identification that incorporates coarse language descriptions to improve cross-modality matching, achieving state-of-the-art results.
Contribution
The paper proposes YYDS with a Y-Y-shape decomposition structure and a cross-modal re-ranking method, addressing the challenge of missing color info and modality bias in VI-ReID.
Findings
YYDS outperforms SOTA on SYSU-MM01, RegDB, and LLCM datasets.
The text-IoU regularization enhances feature decomposition.
CMKR improves neighbor search and re-ranking in cross-modal retrieval.
Abstract
Visible-infrared person re-identification (VI-ReID) is challenging due to considerable cross-modality discrepancies. Existing works mainly focus on learning modality-invariant features while suppressing modality-specific ones. However, retrieving visible images only depends on infrared samples is an extreme problem because of the absence of color information. To this end, we present the Refer-VI-ReID settings, which aims to match target visible images from both infrared images and coarse language descriptions (e.g., "a man with red top and black pants") to complement the missing color information. To address this task, we design a Y-Y-shape decomposition structure, dubbed YYDS, to decompose and aggregate texture and color features of targets. Specifically, the text-IoU regularization strategy is firstly presented to facilitate the decomposition training, and a joint relation module is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Impact of Light on Environment and Health
MethodsFocus
