Unified Multi-Dataset Training for TBPS
Nilanjana Chatterjee, Sidharatha Garg, A V Subramanyam, Brejesh Lall

TL;DR
This paper introduces Scale-TBPS, a unified training approach for Text-Based Person Search that effectively combines multiple datasets, overcoming previous limitations of dataset-specific fine-tuning and noisy data, leading to superior performance.
Contribution
It proposes a noise-aware dataset curation and a scalable discriminative identity learning framework for unified TBPS training across multiple datasets.
Findings
Unified model outperforms dataset-specific models
Effective handling of noisy image-text pairs
Scales well with many person identities
Abstract
Text-Based Person Search (TBPS) has seen significant progress with vision-language models (VLMs), yet it remains constrained by limited training data and the fact that VLMs are not inherently pre-trained for pedestrian-centric recognition. Existing TBPS methods therefore rely on dataset-centric fine-tuning to handle distribution shift, resulting in multiple independently trained models for different datasets. While synthetic data can increase the scale needed to fine-tune VLMs, it does not eliminate dataset-specific adaptation. This motivates a fundamental question: can we train a single unified TBPS model across multiple datasets? We show that naive joint training over all datasets remains sub-optimal because current training paradigms do not scale to a large number of unique person identities and are vulnerable to noisy image-text pairs. To address these challenges, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Face recognition and analysis
