Attribute-Aware Implicit Modality Alignment for Text Attribute Person Search
Xin Wang, Fangfang Liu, Zheng Li, Caili Guo

TL;DR
This paper introduces a novel framework for text-based person search that effectively aligns local and global features between textual attributes and images, significantly improving retrieval accuracy by addressing modality gaps.
Contribution
The proposed Attribute-Aware Implicit Modality Alignment (AIMA) framework combines structured sentence prompts, a masked attribute prediction module, and an IoU-guided contrastive loss to enhance cross-modal alignment.
Findings
Outperforms state-of-the-art on Market-1501 Attribute, PETA, and PA100K datasets.
Effectively aligns local attribute features with image details.
Improves semantic consistency in attribute embedding space.
Abstract
Text attribute person search aims to find specific pedestrians through given textual attributes, which is very meaningful in the scene of searching for designated pedestrians through witness descriptions. The key challenge is the significant modality gap between textual attributes and images. Previous methods focused on achieving explicit representation and alignment through unimodal pre-trained models. Nevertheless, the absence of inter-modality correspondence in these models may lead to distortions in the local information of intra-modality. Moreover, these methods only considered the alignment of inter-modality and ignored the differences between different attribute categories. To mitigate the above problems, we propose an Attribute-Aware Implicit Modality Alignment (AIMA) framework to learn the correspondence of local representations between textual attributes and images and combine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
MethodsContrastive Language-Image Pre-training
