CLIP-DFGS: A Hard Sample Mining Method for CLIP in Generalizable Person   Re-Identification

Huazhong Zhao; Lei Qi; and Xin Geng

arXiv:2410.11255·cs.CV·October 16, 2024

CLIP-DFGS: A Hard Sample Mining Method for CLIP in Generalizable Person Re-Identification

Huazhong Zhao, Lei Qi, and Xin Geng

PDF

Open Access

TL;DR

This paper introduces DFGS, a hard sample mining method for CLIP that improves its generalizable person re-identification performance by selecting challenging samples to enhance fine-grained feature extraction.

Contribution

The paper proposes DFGS, a depth-first graph sampling technique, to generate difficult mini-batches for CLIP, improving its ability to distinguish individuals in person re-identification tasks.

Findings

01

DFGS significantly improves CLIP's re-identification accuracy.

02

Challenging mini-batches enhance fine-grained feature learning.

03

Method outperforms existing sampling strategies.

Abstract

Recent advancements in pre-trained vision-language models like CLIP have shown promise in person re-identification (ReID) applications. However, their performance in generalizable person re-identification tasks remains suboptimal. The large-scale and diverse image-text pairs used in CLIP's pre-training may lead to a lack or insufficiency of certain fine-grained features. In light of these challenges, we propose a hard sample mining method called DFGS (Depth-First Graph Sampler), based on depth-first search, designed to offer sufficiently challenging samples to enhance CLIP's ability to extract fine-grained features. DFGS can be applied to both the image encoder and the text encoder in CLIP. By leveraging the powerful cross-modal learning capabilities of CLIP, we aim to apply our DFGS method to extract challenging samples and form mini-batches with high discriminative difficulty,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGait Recognition and Analysis

MethodsContrastive Language-Image Pre-training