Multi-Grained Vision-Language Alignment for Domain Generalized Person Re-Identification

Jiachen Li; Xiaojin Gong; Dongping Zhang

arXiv:2603.14012·cs.CV·March 17, 2026

Multi-Grained Vision-Language Alignment for Domain Generalized Person Re-Identification

Jiachen Li, Xiaojin Gong, Dongping Zhang

PDF

Open Access

TL;DR

This paper introduces a multi-grained vision-language alignment framework based on CLIP for domain-generalized person re-identification, enhancing fine-grained feature extraction and alignment to improve generalization across unseen domains.

Contribution

It proposes a novel multi-grained prompt and attention mechanism for better visual-language alignment in DG Re-ID, addressing limitations of global features in existing VLMs.

Findings

01

Achieves superior performance on single- and multi-source protocols

02

Effectively extracts fine-grained body part features

03

Demonstrates improved domain generalization

Abstract

Domain Generalized person Re-identification (DG Re-ID) is a challenging task, where models are trained on source domains but tested on unseen target domains. Although previous pure vision-based models have achieved significant progress, the performance remains further improved. Recently, Vision-Language Models (VLMs) present outstanding generalization capabilities in various visual applications. However, directly adapting a VLM to Re-ID shows limited generalization improvement. This is because the VLM only produces with global features that are insensitive to ID nuances. To tacle this problem, we propose a CLIP-based multi-grained vision-language alignment framework in this work. Specifically, several multi-grained prompts are introduced in language modality to describe different body parts and align with their counterparts in vision modality. To obtain fine-grained visual information,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Advanced Neural Network Applications