Learning Semantic-Aligned Feature Representation for Text-based Person   Search

Shiping Li; Min Cao; Min Zhang

arXiv:2112.06714·cs.CV·December 14, 2021

Learning Semantic-Aligned Feature Representation for Text-based Person Search

Shiping Li, Min Cao, Min Zhang

PDF

1 Repo

TL;DR

This paper introduces a semantic-aligned embedding approach for text-based person search, effectively reducing the inter-modality gap by learning aligned visual and textual features using Transformer backbones and a part-aware aggregation network.

Contribution

It proposes a novel semantic-aligned feature aggregation network with multi-head attention and cross-modality constraints, achieving state-of-the-art results in text-based person search.

Findings

01

Achieves state-of-the-art performance on CUHK-PEDES and Flickr30K datasets.

02

Effectively aligns visual and textual features across modalities.

03

Improves part-aware feature representation for better retrieval accuracy.

Abstract

Text-based person search aims to retrieve images of a certain pedestrian by a textual description. The key challenge of this task is to eliminate the inter-modality gap and achieve the feature alignment across modalities. In this paper, we propose a semantic-aligned embedding method for text-based person search, in which the feature alignment across modalities is achieved by automatically learning the semantic-aligned visual features and textual features. First, we introduce two Transformer-based backbones to encode robust feature representations of the images and texts. Second, we design a semantic-aligned feature aggregation network to adaptively select and aggregate features with the same semantics into part-aware features, which is achieved by a multi-head attention module constrained by a cross-modality part alignment loss and a diversity loss. Experimental results on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

reallsp/SAF
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Linear Layer