LAIP: Learning Local Alignment from Image-Phrase Modeling for Text-based Person Search
Haiguang Wang, Yu Wu, Mengxia Wu, Cao Min, Min Zhang

TL;DR
This paper introduces LAIP, a novel framework for text-based person search that enhances local alignment by leveraging bidirectional attention and focused phrase modeling, significantly improving retrieval accuracy.
Contribution
LAIP combines bidirectional attention-weighted local alignment and mask phrase modeling to better utilize information and improve local matching in text-based person search.
Findings
LAIP outperforms existing methods on multiple datasets.
Bidirectional attention improves local alignment accuracy.
Focused phrase modeling reduces bias and enhances detail discrimination.
Abstract
Text-based person search aims at retrieving images of a particular person based on a given textual description. A common solution for this task is to directly match the entire images and texts, i.e., global alignment, which fails to deal with discerning specific details that discriminate against appearance-similar people. As a result, some works shift their attention towards local alignment. One group matches fine-grained parts using forward attention weights of the transformer yet underutilizes information. Another implicitly conducts local alignment by reconstructing masked parts based on unmasked context yet with a biased masking strategy. All limit performance improvement. This paper proposes the Local Alignment from Image-Phrase modeling (LAIP) framework, with Bidirectional Attention-weighted local alignment (BidirAtt) and Mask Phrase Modeling (MPM) module.BidirAtt goes beyond the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Data-Driven Disease Surveillance · Biomedical Text Mining and Ontologies
