Knowing Where to Focus: Attention-Guided Alignment for Text-based Person Search
Lei Tan, Weihao Li, Pingyang Dai, Jie Chen, Liujuan Cao, Rongrong Ji

TL;DR
This paper introduces an Attention-Guided Alignment framework for Text-Based Person Search that dynamically masks meaningful words and enriches text descriptions to improve cross-modal representation alignment, achieving state-of-the-art results.
Contribution
The paper proposes a novel Attention-Guided Mask and Text Enrichment Module to enhance cross-modal alignment and handle noisy descriptions in TBPS.
Findings
Achieved new state-of-the-art Rank-1 accuracy on three benchmarks.
Effectively masks semantically meaningful words to improve alignment.
Enriches text descriptions to mitigate noise and improve robustness.
Abstract
In the realm of Text-Based Person Search (TBPS), mainstream methods aim to explore more efficient interaction frameworks between text descriptions and visual data. However, recent approaches encounter two principal challenges. Firstly, the widely used random-based Masked Language Modeling (MLM) considers all the words in the text equally during training. However, massive semantically vacuous words ('with', 'the', etc.) be masked fail to contribute efficient interaction in the cross-modal MLM and hampers the representation alignment. Secondly, manual descriptions in TBPS datasets are tedious and inevitably contain several inaccuracies. To address these issues, we introduce an Attention-Guided Alignment (AGA) framework featuring two innovative components: Attention-Guided Mask (AGM) Modeling and Text Enrichment Module (TEM). AGM dynamically masks semantically meaningful words by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Biomedical Text Mining and Ontologies
MethodsSoftmax · Attention Is All You Need · ALIGN
