RaSa: Relation and Sensitivity Aware Representation Learning for Text-based Person Search
Yang Bai, Min Cao, Daming Gao, Ziqiang Cao, Chen Chen, Zhenfeng Fan,, Liqiang Nie, Min Zhang

TL;DR
RaSa introduces relation-aware and sensitivity-aware learning tasks to improve multi-modal representations for text-based person search, effectively handling noise and enhancing robustness, leading to state-of-the-art performance on multiple datasets.
Contribution
The paper proposes RaSa, a novel method with relation-aware and sensitivity-aware tasks, addressing noise and robustness issues in text-based person search.
Findings
RaSa outperforms existing methods by up to 15.35% in Rank@1.
It effectively distinguishes strong and weak positive pairs.
RaSa improves robustness by detecting sensitive transformations.
Abstract
Text-based person search aims to retrieve the specified person images given a textual description. The key to tackling such a challenging task is to learn powerful multi-modal representations. Towards this, we propose a Relation and Sensitivity aware representation learning method (RaSa), including two novel tasks: Relation-Aware learning (RA) and Sensitivity-Aware learning (SA). For one thing, existing methods cluster representations of all positive pairs without distinction and overlook the noise problem caused by the weak positive pairs where the text and the paired image have noise correspondences, thus leading to overfitting learning. RA offsets the overfitting risk by introducing a novel positive relation detection task (i.e., learning to distinguish strong and weak positive pairs). For another thing, learning invariant representation under data augmentation (i.e., being…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
MethodsALBEF
