TF-CLIP: Learning Text-free CLIP for Video-based Person   Re-Identification

Chenyang Yu; Xuehu Liu; Yingquan Wang; Pingping Zhang and; Huchuan Lu

arXiv:2312.09627·cs.CV·December 18, 2023·2 cites

TF-CLIP: Learning Text-free CLIP for Video-based Person Re-Identification

Chenyang Yu, Xuehu Liu, Yingquan Wang, Pingping Zhang and, Huchuan Lu

PDF

Open Access 1 Repo 1 Video

TL;DR

TF-CLIP introduces a text-free, CLIP-based framework for video person re-identification that leverages identity-specific memory and temporal diffusion to improve performance without relying on text descriptions.

Contribution

The paper proposes a novel one-stage, text-free CLIP-based learning framework with memory modules and temporal diffusion for enhanced video-based person ReID.

Findings

01

Outperforms state-of-the-art methods on MARS, LS-VID, and iLIDS-VID datasets.

02

Introduces CLIP-Memory and Temporal Memory Diffusion modules for better temporal feature extraction.

03

Demonstrates effectiveness of text-free approach in video person ReID.

Abstract

Large-scale language-image pre-trained models (e.g., CLIP) have shown superior performances on many cross-modal retrieval tasks. However, the problem of transferring the knowledge learned from such models to video-based person re-identification (ReID) has barely been explored. In addition, there is a lack of decent text descriptions in current ReID benchmarks. To address these issues, in this work, we propose a novel one-stage text-free CLIP-based learning framework named TF-CLIP for video-based person ReID. More specifically, we extract the identity-specific sequence feature as the CLIP-Memory to replace the text feature. Meanwhile, we design a Sequence-Specific Prompt (SSP) module to update the CLIP-Memory online. To capture temporal information, we further propose a Temporal Memory Diffusion (TMD) module, which consists of two key components: Temporal Memory Construction (TMC) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

asuradayuci/tf-clip
tfOfficial

Videos

TF-CLIP: Learning Text-Free CLIP for Video-Based Person Re-identification· underline

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Gait Recognition and Analysis · Human Pose and Action Recognition

MethodsDiffusion