Image-Text Knowledge Modeling for Unsupervised Multi-Scenario Person Re-Identification

Zhiqi Pang; Lingling Zhao; Yang Liu; Chunyu Wang; Gaurav Sharma

arXiv:2601.11243·cs.CV·January 19, 2026

Image-Text Knowledge Modeling for Unsupervised Multi-Scenario Person Re-Identification

Zhiqi Pang, Lingling Zhao, Yang Liu, Chunyu Wang, Gaurav Sharma

PDF

Open Access

TL;DR

This paper introduces a novel unsupervised multi-scenario person re-identification framework leveraging vision-language models, significantly improving cross-scenario matching accuracy by integrating image-text knowledge modeling.

Contribution

It proposes a three-stage framework, ITKM, that adaptively leverages CLIP's vision-language capabilities for multi-scenario person ReID, including scenario embedding, text embedding optimization, and heterogeneous matching modules.

Findings

01

Outperforms existing scenario-specific methods.

02

Enhances overall ReID performance across diverse scenarios.

03

Demonstrates strong generalizability and effectiveness.

Abstract

We propose unsupervised multi-scenario (UMS) person re-identification (ReID) as a new task that expands ReID across diverse scenarios (cross-resolution, clothing change, etc.) within a single coherent framework. To tackle UMS-ReID, we introduce image-text knowledge modeling (ITKM) -- a three-stage framework that effectively exploits the representational power of vision-language models. We start with a pre-trained CLIP model with an image encoder and a text encoder. In Stage I, we introduce a scenario embedding in the image encoder and fine-tune the encoder to adaptively leverage knowledge from multiple scenarios. In Stage II, we optimize a set of learned text embeddings to associate with pseudo-labels from Stage I and introduce a multi-scenario separation loss to increase the divergence between inter-scenario text representations. In Stage III, we first introduce cluster-level and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Advanced Neural Network Applications