Multi-Grained Vision-Language Alignment for Domain Generalized Person Re-Identification
Jiachen Li, Xiaojin Gong, Dongping Zhang

TL;DR
This paper introduces a multi-grained vision-language alignment framework based on CLIP for domain-generalized person re-identification, enhancing fine-grained feature extraction and alignment to improve generalization across unseen domains.
Contribution
It proposes a novel multi-grained prompt and attention mechanism for better visual-language alignment in DG Re-ID, addressing limitations of global features in existing VLMs.
Findings
Achieves superior performance on single- and multi-source protocols
Effectively extracts fine-grained body part features
Demonstrates improved domain generalization
Abstract
Domain Generalized person Re-identification (DG Re-ID) is a challenging task, where models are trained on source domains but tested on unseen target domains. Although previous pure vision-based models have achieved significant progress, the performance remains further improved. Recently, Vision-Language Models (VLMs) present outstanding generalization capabilities in various visual applications. However, directly adapting a VLM to Re-ID shows limited generalization improvement. This is because the VLM only produces with global features that are insensitive to ID nuances. To tacle this problem, we propose a CLIP-based multi-grained vision-language alignment framework in this work. Specifically, several multi-grained prompts are introduced in language modality to describe different body parts and align with their counterparts in vision modality. To obtain fine-grained visual information,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Advanced Neural Network Applications
