Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID

Wentao Tan; Changxing Ding; Jiayu Jiang; Fei Wang; Yibing Zhan; Dapeng; Tao

arXiv:2405.04940·cs.CV·July 2, 2024·1 cites

Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID

Wentao Tan, Changxing Ding, Jiayu Jiang, Fei Wang, Yibing Zhan, Dapeng, Tao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a scalable, transferable text-to-image person re-identification approach using large-scale multi-modal language models, addressing description diversity and accuracy issues to improve cross-dataset performance.

Contribution

The study proposes novel methods for generating diverse textual descriptions and automatically filtering incorrect words, enhancing transferability and performance of text-to-image ReID models.

Findings

01

Significant improvement in cross-dataset ReID accuracy.

02

State-of-the-art results in traditional evaluation settings.

03

Effective handling of noisy textual descriptions.

Abstract

Text-to-image person re-identification (ReID) retrieves pedestrian images according to textual descriptions. Manually annotating textual descriptions is time-consuming, restricting the scale of existing datasets and therefore the generalization ability of ReID models. As a result, we study the transferable text-to-image ReID problem, where we train a model on our proposed large-scale database and directly deploy it to various datasets for evaluation. We obtain substantial training data via Multi-modal Large Language Models (MLLMs). Moreover, we identify and address two key challenges in utilizing the obtained textual descriptions. First, an MLLM tends to generate descriptions with similar structures, causing the model to overfit specific sentence patterns. Thus, we propose a novel method that uses MLLMs to caption images according to various templates. These templates are obtained using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wentaotan/mllm4text-reid
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis