Video-Level Language-Driven Video-Based Visible-Infrared Person Re-Identification
Shuang Li, Jiaxu Leng, Changjiang Kuang, Mingpi Tan, Xinbo Gao

TL;DR
This paper introduces a novel framework that leverages language prompts and spatiotemporal modeling to improve cross-modality person re-identification in videos, achieving state-of-the-art results.
Contribution
It proposes a new language-driven approach with invariant-modality prompting and spatiotemporal modules to bridge modality gaps in video-based person re-identification.
Findings
Achieves state-of-the-art performance on VVI-ReID benchmarks
Effectively mitigates modality differences using language prompts
Enhances spatiotemporal feature modeling with dedicated modules
Abstract
Video-based Visible-Infrared Person Re-Identification (VVI-ReID) aims to match pedestrian sequences across modalities by extracting modality-invariant sequence-level features. As a high-level semantic representation, language provides a consistent description of pedestrian characteristics in both infrared and visible modalities. Leveraging the Contrastive Language-Image Pre-training (CLIP) model to generate video-level language prompts and guide the learning of modality-invariant sequence-level features is theoretically feasible. However, the challenge of generating and utilizing modality-shared video-level language prompts to address modality gaps remains a critical problem. To address this problem, we propose a simple yet powerful framework, video-level language-driven VVI-ReID (VLD), which consists of two core modules: invariant-modality language prompting (IMLP) and spatial-temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Gait Recognition and Analysis · Face recognition and analysis
