CLIP-Driven Fine-grained Text-Image Person Re-identification

Shuanglin Yan; Neng Dong; Liyan Zhang; Jinhui Tang

arXiv:2210.10276·cs.CV·October 20, 2022·6 cites

CLIP-Driven Fine-grained Text-Image Person Re-identification

Shuanglin Yan, Neng Dong, Liyan Zhang, Jinhui Tang

PDF

Open Access 1 Repo

TL;DR

This paper introduces CFine, a CLIP-based framework for fine-grained text-image person re-identification, which effectively mines intra- and inter-modal discriminative clues without additional feature embedding, leading to superior performance.

Contribution

The paper proposes a novel CLIP-driven framework with multi-grained feature learning, cross-grained refinement, and fine-grained correspondence discovery for improved TIReID.

Findings

01

Achieves state-of-the-art results on multiple benchmarks.

02

Effectively mines intra-modal discriminative clues.

03

Establishes precise cross-modal correspondences.

Abstract

TIReID aims to retrieve the image corresponding to the given text query from a pool of candidate images. Existing methods employ prior knowledge from single-modality pre-training to facilitate learning, but lack multi-modal correspondences. Besides, due to the substantial gap between modalities, existing methods embed the original modal features into the same latent space for cross-modal alignment. However, feature embedding may lead to intra-modal information distortion. Recently, CLIP has attracted extensive attention from researchers due to its powerful semantic concept learning capacity and rich multi-modal knowledge, which can help us solve the above problems. Accordingly, in the paper, we propose a CLIP-driven Fine-grained information excavation framework (CFine) to fully utilize the powerful knowledge of CLIP for TIReID. To transfer the multi-modal knowledge effectively, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shuanglinyan/CFine
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods

MethodsContrastive Language-Image Pre-training