TL;DR
This paper introduces a novel uncertainty-aware test-time adaptation method for text-based person search, improving domain generalization without requiring extensive target-domain labels.
Contribution
It proposes UATTA, a framework that dynamically adapts models using unlabeled test data and a disagreement-based uncertainty measure, enhancing deployment practicality.
Findings
UATTA improves performance across four benchmarks.
It outperforms existing offline test-time adaptation methods.
The approach is effective for both CLIP-based and XVLM-based frameworks.
Abstract
Text-based person search faces inherent limitations due to data scarcity, driven by stringent privacy constraints and the high cost of manual annotation. To mitigate this, existing methods usually rely on a Pretrain-then-Finetune paradigm, where models are first pretrained on synthetic person-caption data to establish cross-modal alignment, followed by fine-tuning on labeled real-world datasets. However, this paradigm lacks practicality in real-world deployment scenarios, where large-scale annotated target-domain data is typically inaccessible. In this work, we propose a new Pretrain-then-Adapt paradigm that eliminates reliance on extensive target-domain supervision through an offline test-time adaptation manner, enabling dynamic model adaptation using only unlabeled test data with minimal post-train time cost. To mitigate overconfidence with false positives of previous entropy-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
