Attribute-Aware Implicit Modality Alignment for Text Attribute Person   Search

Xin Wang; Fangfang Liu; Zheng Li; Caili Guo

arXiv:2406.03721·cs.CV·June 7, 2024·1 cites

Attribute-Aware Implicit Modality Alignment for Text Attribute Person Search

Xin Wang, Fangfang Liu, Zheng Li, Caili Guo

PDF

Open Access

TL;DR

This paper introduces a novel framework for text-based person search that effectively aligns local and global features between textual attributes and images, significantly improving retrieval accuracy by addressing modality gaps.

Contribution

The proposed Attribute-Aware Implicit Modality Alignment (AIMA) framework combines structured sentence prompts, a masked attribute prediction module, and an IoU-guided contrastive loss to enhance cross-modal alignment.

Findings

01

Outperforms state-of-the-art on Market-1501 Attribute, PETA, and PA100K datasets.

02

Effectively aligns local attribute features with image details.

03

Improves semantic consistency in attribute embedding space.

Abstract

Text attribute person search aims to find specific pedestrians through given textual attributes, which is very meaningful in the scene of searching for designated pedestrians through witness descriptions. The key challenge is the significant modality gap between textual attributes and images. Previous methods focused on achieving explicit representation and alignment through unimodal pre-trained models. Nevertheless, the absence of inter-modality correspondence in these models may lead to distortions in the local information of intra-modality. Moreover, these methods only considered the alignment of inter-modality and ignored the differences between different attribute categories. To mitigate the above problems, we propose an Attribute-Aware Implicit Modality Alignment (AIMA) framework to learn the correspondence of local representations between textual attributes and images and combine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management

MethodsContrastive Language-Image Pre-training