CLIP-DFGS: A Hard Sample Mining Method for CLIP in Generalizable Person Re-Identification
Huazhong Zhao, Lei Qi, and Xin Geng

TL;DR
This paper introduces DFGS, a hard sample mining method for CLIP that improves its generalizable person re-identification performance by selecting challenging samples to enhance fine-grained feature extraction.
Contribution
The paper proposes DFGS, a depth-first graph sampling technique, to generate difficult mini-batches for CLIP, improving its ability to distinguish individuals in person re-identification tasks.
Findings
DFGS significantly improves CLIP's re-identification accuracy.
Challenging mini-batches enhance fine-grained feature learning.
Method outperforms existing sampling strategies.
Abstract
Recent advancements in pre-trained vision-language models like CLIP have shown promise in person re-identification (ReID) applications. However, their performance in generalizable person re-identification tasks remains suboptimal. The large-scale and diverse image-text pairs used in CLIP's pre-training may lead to a lack or insufficiency of certain fine-grained features. In light of these challenges, we propose a hard sample mining method called DFGS (Depth-First Graph Sampler), based on depth-first search, designed to offer sufficiently challenging samples to enhance CLIP's ability to extract fine-grained features. DFGS can be applied to both the image encoder and the text encoder in CLIP. By leveraging the powerful cross-modal learning capabilities of CLIP, we aim to apply our DFGS method to extract challenging samples and form mini-batches with high discriminative difficulty,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGait Recognition and Analysis
MethodsContrastive Language-Image Pre-training
