ChatReID: Open-ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models
Ke Niu, Haiyang Yu, Mengyang Zhao, Teng Fu, Siyang Yi, Wei Lu, Bin Li, Xuelin Qian, Xiangyang Xue

TL;DR
ChatReID introduces a hierarchical progressive tuning framework leveraging vision-language models for flexible, interactive person re-identification, significantly improving accuracy and reasoning capabilities across multiple benchmarks.
Contribution
The paper presents a novel hierarchical tuning strategy and a large-scale instruction dataset to enhance vision-language models for person re-ID tasks.
Findings
Achieves state-of-the-art performance on ten benchmarks.
Effectively recognizes fine-grained details in re-identification.
Demonstrates strong reasoning and multi-modal integration abilities.
Abstract
Person re-identification (Re-ID) is a crucial task in computer vision, aiming to recognize individuals across non-overlapping camera views. While recent advanced vision-language models (VLMs) excel in logical reasoning and multi-task generalization, their applications in Re-ID tasks remain limited. They either struggle to perform accurate matching based on identity-relevant features or assist image-dominated branches as auxiliary semantics. In this paper, we propose a novel framework ChatReID, that shifts the focus towards a text-side-dominated retrieval paradigm, enabling flexible and interactive re-identification. To integrate the reasoning abilities of language models into Re-ID pipelines, We first present a large-scale instruction dataset, which contains more than 8 million prompts to promote the model fine-tuning. Next. we introduce a hierarchical progressive tuning strategy, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Multimodal Machine Learning Applications · Advanced Neural Network Applications
MethodsADaptive gradient method with the OPTimal convergence rate
