Find Them All: Unveiling MLLMs for Versatile Person Re-identification

Jinhao Li; Zijian Chen; Lirong Deng; Guangtao Zhai; Changbo Wang

arXiv:2508.06908·cs.CV·November 25, 2025

Find Them All: Unveiling MLLMs for Versatile Person Re-identification

Jinhao Li, Zijian Chen, Lirong Deng, Guangtao Zhai, Changbo Wang

PDF

Open Access

TL;DR

This paper introduces VP-ReID, a comprehensive benchmark for evaluating multi-modal large language models in diverse person re-identification tasks, highlighting their potential and limitations across modalities.

Contribution

The paper presents a new benchmark VP-ReID with extensive multi-modal data and evaluation schemes, and explores the capabilities of MLLMs in person ReID tasks.

Findings

01

MLLMs show strong versatility and effectiveness in person ReID.

02

Limitations exist in handling thermal and infrared modalities.

03

VP-ReID facilitates development of robust cross-modal ReID models.

Abstract

Person re-identification (ReID) aims to retrieve images of a target person from the gallery set, with wide applications in medical rehabilitation and public security. However, traditional person ReID models are typically uni-modal, resulting in limited generalizability across heterogeneous data modalities. Recently, the emergence of multi-modal large language models (MLLMs) has shown a promising avenue for addressing this issue. Despite this potential, existing methods merely regard MLLMs as feature extractors or caption generators, leaving their capabilities in person ReID tasks largely unexplored. To bridge this gap, we introduce a novel benchmark for \underline{\textbf{V}}ersatile \underline{\textbf{P}}erson \underline{\textbf{Re}}-\underline{\textbf{ID}}entification, termed VP-ReID. The benchmark includes 257,310 multi-modal queries and gallery images, covering ten diverse person…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Multimodal Machine Learning Applications