Find Them All: Unveiling MLLMs for Versatile Person Re-identification
Jinhao Li, Zijian Chen, Lirong Deng, Guangtao Zhai, Changbo Wang

TL;DR
This paper introduces VP-ReID, a comprehensive benchmark for evaluating multi-modal large language models in diverse person re-identification tasks, highlighting their potential and limitations across modalities.
Contribution
The paper presents a new benchmark VP-ReID with extensive multi-modal data and evaluation schemes, and explores the capabilities of MLLMs in person ReID tasks.
Findings
MLLMs show strong versatility and effectiveness in person ReID.
Limitations exist in handling thermal and infrared modalities.
VP-ReID facilitates development of robust cross-modal ReID models.
Abstract
Person re-identification (ReID) aims to retrieve images of a target person from the gallery set, with wide applications in medical rehabilitation and public security. However, traditional person ReID models are typically uni-modal, resulting in limited generalizability across heterogeneous data modalities. Recently, the emergence of multi-modal large language models (MLLMs) has shown a promising avenue for addressing this issue. Despite this potential, existing methods merely regard MLLMs as feature extractors or caption generators, leaving their capabilities in person ReID tasks largely unexplored. To bridge this gap, we introduce a novel benchmark for \underline{\textbf{V}}ersatile \underline{\textbf{P}}erson \underline{\textbf{Re}}-\underline{\textbf{ID}}entification, termed VP-ReID. The benchmark includes 257,310 multi-modal queries and gallery images, covering ten diverse person…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Multimodal Machine Learning Applications
