X-ReID: Multi-granularity Information Interaction for Video-Based Visible-Infrared Person Re-Identification
Chenyang Yu, Xuehu Liu, Pingping Zhang, Huchuan Lu

TL;DR
This paper introduces X-ReID, a novel framework that effectively reduces modality gaps and leverages spatiotemporal information for video-based visible-infrared person re-identification, achieving superior results on large benchmarks.
Contribution
The paper proposes a cross-modality prototype collaboration and multi-granularity information interaction framework for improved VVI-ReID performance.
Findings
Outperforms state-of-the-art on HITSZ-VCM and BUPTCampus benchmarks.
Effectively reduces modality discrepancy and enhances temporal modeling.
Achieves robust sequence-level representations for VVI-ReID.
Abstract
Large-scale vision-language models (e.g., CLIP) have recently achieved remarkable performance in retrieval tasks, yet their potential for Video-based Visible-Infrared Person Re-Identification (VVI-ReID) remains largely unexplored. The primary challenges are narrowing the modality gap and leveraging spatiotemporal information in video sequences. To address the above issues, in this paper, we propose a novel cross-modality feature learning framework named X-ReID for VVI-ReID. Specifically, we first propose a Cross-modality Prototype Collaboration (CPC) to align and integrate features from different modalities, guiding the network to reduce the modality discrepancy. Then, a Multi-granularity Information Interaction (MII) is designed, incorporating short-term interactions from adjacent frames, long-term cross-frame information fusion, and cross-modality feature alignment to enhance temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Human Pose and Action Recognition
