MLLMReID: Multimodal Large Language Model-based Person Re-identification
Shan Yang, Yongfei Zhang

TL;DR
This paper introduces MLLMReID, a novel approach that adapts multimodal large language models for person re-identification by using common instructions and a multi-task learning synchronization module, achieving superior results.
Contribution
It proposes a simple instruction method and a multi-task learning synchronization module to effectively adapt MLLMs for ReID tasks, addressing overfitting and training synchronization issues.
Findings
MLLMReID outperforms existing methods in ReID accuracy.
The common instruction approach simplifies instruction design.
Synchronization improves visual encoder training effectiveness.
Abstract
Multimodal large language models (MLLM) have achieved satisfactory results in many tasks. However, their performance in the task of ReID (ReID) has not been explored to date. This paper will investigate how to adapt them for the task of ReID. An intuitive idea is to fine-tune MLLM with ReID image-text datasets, and then use their visual encoder as a backbone for ReID. However, there still exist two apparent issues: (1) Designing instructions for ReID, MLLMs may overfit specific instructions, and designing a variety of instructions will lead to higher costs. (2) When fine-tuning the visual encoder of a MLLM, it is not trained synchronously with the ReID task. As a result, the effectiveness of the visual encoder fine-tuning cannot be directly reflected in the performance of the ReID task. To address these problems, this paper proposes MLLMReID: Multimodal Large Language Model-based ReID.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Multimodal Machine Learning Applications
