Text-Based Person Search with Limited Data
Xiao Han, Sen He, Li Zhang, Tao Xiang

TL;DR
This paper introduces a novel framework for text-based person search that leverages contrastive learning and transfer learning to overcome limited data challenges, achieving state-of-the-art results on CUHK-PEDES.
Contribution
It proposes a cross-modal momentum contrastive learning framework and a transfer learning approach to improve fine-grained person search with limited data.
Findings
Achieves new state-of-the-art on CUHK-PEDES dataset.
Significant improvements in Rank-1 and mAP metrics.
Effectively utilizes small-scale datasets with novel training strategies.
Abstract
Text-based person search (TBPS) aims at retrieving a target person from an image gallery with a descriptive text query. Solving such a fine-grained cross-modal retrieval task is challenging, which is further hampered by the lack of large-scale datasets. In this paper, we present a framework with two novel components to handle the problems brought by limited data. Firstly, to fully utilize the existing small-scale benchmarking datasets for more discriminative feature learning, we introduce a cross-modal momentum contrastive learning framework to enrich the training data for a given mini-batch. Secondly, we propose to transfer knowledge learned from existing coarse-grained large-scale datasets containing image-text pairs from drastically different problem domains to compensate for the lack of TBPS training data. A transfer learning method is designed so that useful information can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods
MethodsContrastive Learning
