Uncertainty-Aware Prototype Semantic Decoupling for Text-Based Person Search in Full Images
Zengli Luo, Canlong Zhang, Zhixin Li, Zhiwen Wang, Chunrong Wei

TL;DR
This paper introduces UPD-TBPS, a framework that reduces uncertainty in text-based pedestrian search in full images by decoupling prototypes and leveraging multi-granularity uncertainty estimation, leading to improved detection and retrieval accuracy.
Contribution
The paper proposes a novel uncertainty-aware framework with three modules that effectively reduce matching uncertainty in complex scenes for text-based person search.
Findings
Significant performance improvements on CUHK-SYSU-TBPS dataset
Effective reduction of detection and matching uncertainties
Enhanced accuracy in complex multi-pedestrian scenes
Abstract
Text-based pedestrian search (TBPS) in full images aims to locate a target pedestrian in untrimmed images using natural language descriptions. However, in complex scenes with multiple pedestrians, existing methods are limited by uncertainties in detection and matching, leading to degraded performance. To address this, we propose UPD-TBPS, a novel framework comprising three modules: Multi-granularity Uncertainty Estimation (MUE), Prototype-based Uncertainty Decoupling (PUD), and Cross-modal Re-identification (ReID). MUE conducts multi-granularity queries to identify potential targets and assigns confidence scores to reduce early-stage uncertainty. PUD leverages visual context decoupling and prototype mining to extract features of the target pedestrian described in the query. It separates and learns pedestrian prototype representations at both the coarse-grained cluster level and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Multimodal Machine Learning Applications
